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Preface 


Nowadays, various sensors are used to collect the data of our environment, including RFID, 
video cameras, pressure, acoustic, etc. A typical sensor converts the physical energy of the object 
under examination into electrical signals to deduce the information of interest. Intelligent sensing 
technology utilizes proper prior knowledge and machine learning techniques to enhance the process 
of information acquisition. Therefore, the whole sensing procedure can be performed at three levels: 
data, information, and knowledge. At the data level, each sample represents a measure of target 
energy within a certain temporal-spatial volume. For example, the pixel value of a video camera 
represents the number of photons emitted within a certain area during a specific time window. The 
information acquired by a sensor is represented by a probabilistic belief over random variables. For 
example, the information of target position can be represented by a Gaussian distribution. The 
knowledge acquired by a sensor is represented by a statistical model describing relations among 
random variables. For example, the behavior of a target can be represented by a hierarchical hidden 
Markov model; a situation can be represented by a hierarchical Bayesian network. 

The unprecedented advances in wireless networking technology enable the deployment of a 
large number of sensors over a wide area without limits of wires. However, the main challenge for 
developing wireless sensor networks is limited resources: power supply, computing complexity, and 
communication bandwidth. There are several possible solutions to overcome these obstacles: (1) 
development of “smart” sensing nodes that can reduce the data volume for information representa- 
tion as well as energy consumption; (2) development of distributed computing “intelligence” that 
allows data fusion, state estimation, and machine learning to be performed in a distributed way; 
and (3) development of ad hoc networking “intelligence” that can guarantee the connectivity of 
sensor systems under various conditions. Besides, energy harvest technologies and powerless sensor 
networks have been developed to relax the limits of power supply. 

The “intelligence” of sensor networks can be achieved in four ways: (1) spatial awareness, 
(2) data awareness, (3) group awareness, and (4) context awareness. Spatial awareness refers to an 
intelligent sensor network’s capability of knowing the relative geometric information of its members 
and targets under examination. This awareness is implemented through sensor self-calibration and 
target state estimation. Data awareness refers to an intelligent sensor network’s capability of 
reducing the data volume for information representation. This awareness is implemented through 
data prediction, data fusion, and statistical model building. Group awareness refers to an intelligent 
sensor network’s capability of all members knowing each other’s states and adjusting each member’s 
behavior according to other members’ actions. For example, distributed estimation and data learning 
are performed through the collaboration of a group of sensor nodes; ad hoc networking techniques 
maintain the connectivity of the whole network. Context awareness refers to an intelligent sensor 
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network’s capability of changing its operation modes based on the knowledge of situations and 
resources to achieve maximum efficiency of sensing performance. This awareness is implemented 
through proper context representation and distributed inference. 


Book Highlights 


So far there are no published books on intelligent sensor network from a machine learning and 
signal processing perspective. Unlike current sensor network books, this book has contributions 
from world-famous sensing experts that deal with the following issues: 


1. Emphasize “intelligent” designs in sensor networks: The intelligence of sensor networks can 
be developed through distributed machine learning or smart sensor design. Machine learn- 
ing of sensor networks can be performed in supervised, unsupervised, or semisupervised 
ways. In Part I, Chapter 1—Machine Learning Basics—we have provided a comprehen- 
sive picture of this area. Machine learning technology is a multidisciplinary field that 
includes probability and statistics, psychology, information theory, and artificial intelli- 
gence. Note that sensor networks often operate in very challenging conditions and need to 
accommodate environmental changes, hardware degradation, and inaccurate sensor read- 
ings. Thus they should learn and adapt to the changes in their operation environment. 
Machine learning can be used to achieve intelligent learning and adaptation. In Part I we 
have also included some chapters that emphasize these “intelligence” aspects of sensor net- 
works. For example, we have explained the components of the intelligent sensor (transducer) 
interfacing problem. We have also discussed how the network can intelligently choose the 
“best” assignment from the available sensors to the missions to maximize the utility of the 
network. 

2. Detail signal processing principles in intelligent sensor networks: Recently, a few advanced signal 
processing principles have been applied in sensor networks. For example, compressive sensing 
is an efficient and effective signal acquisition and sampling framework for sensor networks. 
It can save transmittal and computational power significantly at the sensor node. Its signal 
acquisition and compression scheme is very simple, so it is suitable for inexpensive sensors. 
As another example, a Kalman filter can be used to identify the sensor data pollution attacks 
in sensor networks. 

3. Elaborate important platforms on intelligent sensor networks: The platforms of intelligent 
sensor networks include smart sensors, RFID-assisted nodes, and distributed self-organization 
architecture. This book covers these platforms. For example, in Part HI we have included 
two chapters on RFID-based sensor function enhancement. The sensor/RFID integration 
can make the sensor better identify and trace surrounding objects. 

4. Explain interesting applications on intelligent sensor networks: Intelligent sensor networks can 
be used for target tracking, object identification, structural health monitoring, and other 
important applications. In most chapters, we have clearly explained how those “intelligent” 
designs can be used for realistic applications. For example, in structural health monitoring 
applications, we can embed the sensors in a concrete bridge. Thus, a bridge fracture can be 
detected in time. Those embedded sensors for field applications can be powered through 
solar-cell batteries. In healthcare applications, we can use medical sensors and intelligent 
body area sensor networks to achieve low-cost, 24/7 patient monitoring from a remote 
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Targeted Audience 
This book is suitable for the following types of readers: 


1. College students: This book can serve as a textbook or reference book for college courses on 
intelligent sensor networks. The courses could be offered in computer science, electrical and 
computer engineering, information technology and science, or other departments. 

2. Researchers: Because each chapter is written by leading experts, the contents will be very 
useful for researchers (such as graduate students and professors) who are interested in the 
application of artificial intelligence and signal processing in sensor networks. 

3. Computer scientists: We have provided many computing algorithms on machine learning and 
data processing in this book. Thus, computer scientists could refer to those principles in their 
own design. 

4, Engineers: We have also provided useful intelligent sensor node/sensor network design 
examples. Thus, company engineers could use those principles in their product design. 


Book Architecture 


This book includes three parts as follows: 

Part I—Machine Learning: This part describes the application of machine learning and other 
artificial intelligence principles in sensor network intelligence. It covers the basics of machine 
learning, including smart sensor/transducer architecture and data representation for intelligent 
sensors, modal parameter—based structural health monitoring based on wireless smart sensors, 
sensor-mission assignment problems in which the objective is to maximize the overall utility of the 
network under different constraints, reducing the amount of communication in sensor networks 
by means of learning techniques, neurodisorder patient monitoring via gait sensor networks, and 
cognitive radio-based sensor networks. 

Part Signal Processing: This part describes the optimization of sensor network performance 
based on digital signal processing (DSP) techniques. It includes the following important topics: 
cross-layer integration of routing and application-specific signal processing, on-board image pro- 
cessing in wireless multimedia sensor networks for intelligent transportation systems, and essential 
signal processing and data analysis methods to effectively handle and process the data acquired 
with the sensor networks for civil infrastructure systems. It also includes a paradigm for validating 
the extent of spatiotemporal associations among data sources to enhance data cleaning in sensor 
networks, a sensor stream reduction application, a basic methodology that is composed of four 
phases (characterization, reduction tools, robustness, and conception), discussions on how the 
compressive sensing (CS) can be used as a useful framework for the sensor networks to compress 
and acquire signals and save transmittal and computational power in sensors, and the use of Kalman 
filters for attack detection in a water system sensor network that consists of water level sensors and 
velocity sensors. 

Part Ill— Networking: This part focuses on detailed network protocol design in order to achieve 
an intelligent sensor networking scenario. It covers the following topics: energy-efficient oppor- 
tunistic routing protocol for sensor networking; multi-agent-driven wireless sensor cooperation for 
limited resource allocation; an illustration of how distributed event detection can achieve both 
high accuracy and energy efficiency; blanket/sweep/barrier coverage issues in sensor networks; lin- 
ear state-estimator, locally and additionally, to perform management procedures that support the 
network of state-estimators to establish self-organization; low-power solution for wireless passive 
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sensor network; the fusion of pre/post RFID correction techniques to reduce anomalies; RFID 
systems and sensor integration for tele-medicine; and the new generation of intrusion detection 
sensor networks. 

Disclaimer: We have tried our best to provide credits to all cited publications in this book. 
We sincerely thank all authors who have published materials on intelligent sensor network and 
who have directly/indirectly contributed to this book through our citations. If you have questions 
on the contents of this book, please contact the editors Fei Hu (fei@eng.ua.edu) or Qi Hao 
(qh@eng.ua.edu). We will correct any errors and thus improve this book in future editions. 


MATLAB® js a registered trademark of The Mathworks, Inc. For product information, please 
contact: 


The MathWorks, Inc. 

3 Apple Hill Drive 

Natick, MA, 01760-2098 USA 
Tel: 508-647-7000 

Fax: 508-647-7001 

E-mail: info@mathworks.com 
Web: www.mathworks.com 
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Chapter 1 


Machine Learning Basics 


Krasimira Kapitanova and Sang H. Son 
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The goal of machine learning is to design and develop algorithms that allow systems to use empirical 
data, experience, and training to evolve and adapt to changes that occur in their environment. 
A major focus of machine learning research is to automatically induce models, such as rules and 
patterns, from the training data it analyzes. As shown in Figure 1.1, machine learning combines 
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Probability 
and 
statistics 


Control theory 


Psychology 


Philosophy 


Information 
theory 


Neurobiology 


Artificial 
intelligence 


Figure 1.1 Machine learning is a broad discipline, combining approaches from many different 
areas. 


techniques and approaches from various areas, including probability and statistics, psychology, 
information theory, and artificial intelligence. 

Wireless sensor network (WSN) applications operate in very challenging conditions, where they 
constantly have to accommodate environmental changes, hardware degradation, and inaccurate 
sensor readings. Therefore, in order to maintain sufficient operational correctness, a WSN applica- 
tion often needs to learn and adapt to the changes in its running environment. Machine learning 
has been used to help address these issues. A number of machine learning algorithms have been 
employed in a wide range of sensor network applications, including activity recognition, health 
care, education, and for improving the efficiency of heating, ventilating, and air conditioning 
(HVAC) system. 

The abundance of machine learning algorithms can be divided into two main classes: supervised 
and unsupervised learning, based on whether the training data instances are labeled. In supervised 
learning, the learner is supplied with labeled training instances, where both the input and the 
correct output are given. In unsupervised learning, the correct output is not provided with the 
input. Instead, the learning program must rely on other sources of feedback to determine whether 
or not it is learning correctly. A third class of machine learning techniques, called semi-supervised 
learning, uses a combination of both labeled and unlabeled data for training. Figure 1.2 shows the 
relationship between these three machine learning classes. 

In this chapter, we have surveyed machine learning algorithms in sensor networks from the 
perspective of what types of applications they have been used for. We give examples from all 
three machine learning classes and discuss how they have been applied in a number of sensor 
network applications. We present the most frequently used machine learning algorithms, including 
clustering, Bayes probabilistic models, Markov models, and decision trees. We also analyze the 
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Supervised 
learning 


Semi-supervised 
learning 


Unsupervised 
learning 


Figure 1.2 Machine learning algorithms are divided into supervised learning, which used labeled 
training data, and unsupervised learning, where labeled training data is not available. A third 
class of machine learning technique, semi-supervised learning, makes use of both labeled and 
unlabeled training data. 


challenges, advantages, and drawbacks of using different machine learning algorithms. Figure 1.3 
shows the machine learning algorithms introduced in this chapter. 


1.1 Supervised Learning 


In supervised learning, the learner is provided with labeled input data. This data contains a sequence 
of input/output pairs of the form (x;, yi), where x; is a possible input and y; is the correctly labeled 
output associated with it. The aim of the learner in supervised learning is to learn the mapping 
from inputs to outputs. The learning program is expected to learn a function f that accounts for the 
input/output pairs seen so far, f (xi) = yi, for all i. This function f is called a classifier if the output 
is discrete and a regression function if the output is continuous. The job of the classifier/regression 
function is to correctly predict the outputs of inputs it has not seen before. For example, the inputs 
can be a set of sensor firings and the outputs can be the activities that have caused those sensor 
nodes to fire. 

The execution ofa supervised learning algorithm can be divided into five main steps (Figure 1.4). 

Step 1 is to determine what training data is needed and to collect that data. Here we need to answer 
two questions: “What data is necessary?” and “How much data do we need?” The designers have to 
decide what training data can best represent real-world scenarios for the specific application. They 
also need to determine how much training data should be collected. Although the more training 
data we have, the better we can train the learning algorithm, collecting training data and providing 
correct labels can often be expensive and laborious. Therefore, an application designer always strives 
to maintain the size of the training data not only large enough to provide sufficient training but 
also small enough to avoid any unnecessary costs associated with data collection and labeling. 

Step 2 is to identify the feature set, also called feature vector, to be used to represent the 
input. Each feature in the feature set represents a characteristic of the objects/events that are being 
classified. There is a trade-off between the size of the feature vector and the classification accuracy of 
the machine learning algorithm. A large feature vector significantly increases the complexity of the 
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Figure 1.3 Classification of the machine learning algorithms most widely used in WSN 
applications. 


Step 0 Determine the type of training examples 


Figure 1.4 The stages of supervised machine learning. 


classification. However, using a small feature vector, which does not contain sufficient description 
of the objects/events, could lead to poor classification accuracy. Therefore, the feature vector should 
be sufficiently large to represent the important features of the object/event and small enough to 
avoid excessive complexity. 

Step 3 is to select a suitable learning algorithm. A number of factors have to be considered when 
choosing a learning algorithm for a particular task, including the content and size of the training 
dataset, noise in the system, accuracy of the labeling, and the heterogeneity and redundancy of the 
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input data. We also have to evaluate the requirements and characteristics of the sensor network 
application itself. For example, for an activity recognition application, the duration of sensor use 
plays a significant role in determining the activity being executed. Therefore, to achieve high activity 
recognition accuracy, we would prefer to use machine learning algorithms that can explicitly model 
state duration. 

The most frequently used supervised machine learning algorithms include support vector 
machines, naive Bayes classifiers, decision trees, hidden Markov models, conditional random 
field, and k-nearest neighbor algorithms. There are also a number of approaches that have been 
applied to improve the performance of the chosen classifiers, such as bagging, boosting, and using 
classifier ensembles. Each of the algorithms has its advantages and disadvantages, which make it 
suitable for some types of applications but inappropriate for others. 

Step 4 is to train the chosen learning algorithm using the collected training data. In this step, 
the algorithm learns the function that best matches the input/output training instances. 

Step 5 is evaluation of the algorithm’s accuracy. We assess the accuracy of the learned function 
with the help of testing dataset, where the testing dataset is different from the training dataset. 
In this step, we evaluate how accurately the machine learning algorithm classifies entries from the 
testing set based on the function it has learned through the training dataset. 

Different supervised learning algorithms have been used and evaluated experimentally in a variety 
of sensor network applications. In the rest of this section, we describe some of the algorithms that 
are most frequently used in WSN applications. 


1.1.1 Decision Trees 


Decision trees are characterized by fast execution time, ease in the interpretation of the rules, and 
scalability for large multidimensional datasets (Cabena et al. 1998, Han 2005). The goal of decision 
tree learning is to create a model that predicts the value of the output variable based on the input 
variables in the feature vector. Each node corresponds to one of the feature vector variables. From 
every node, there are edges to children, where there is an edge per each of the possible values (or 
range of values) of the input variable associated with the node. Each leaf represents a possible value 
for the output variable. The output variable is determined by following a path that starts at the 
root and is guided by the values of the input variables. 

Figure 1.5 shows an example decision tree for a sensor network activity detection application. In 
this scenario, we assume that there are only two events of interest in the kitchen: cooking and getting 
a drink. The decision tree uses sensor node firings to distinguish between these two activities. For 
example, if there is movement in the kitchen and the stove is being used, the algorithm determines 
that the residents must be cooking. However, if there is movement in the kitchen, the stove is not 
being used, and somebody opens the cups cupboard, the algorithm decides that the activity being 
performed at the moment is getting a drink. This is a simple example illustrating how decision trees 
can be applied to sensor network applications. In reality, the decision trees that are learned by real 
applications are much more complex. 

The C4.5 algorithm is one of the well-known, top-down, greedy search algorithms for building 
decision trees (Quinlan 1993). The algorithm uses entropy and information gain metrics to induce 
a decision tree. The C4.5 algorithm has been used for activity recognition in the PlaceLab project 
at MIT (Logan et al. 2007). The authors of the project monitored a home deployed with over 900 
sensors, including wired reed switches, current and water flow inputs, object and person motion 
detectors, and radio frequency identification (RFID) tags. They collected data for 43 typical house 
activities, and C4.5 was one of the classifiers used by their activity recognition approach. 
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Figure 1.5 Example decision tree for an activity detection application. In this scenario, we are 
interested only in two of the kitchen activities: cooking and getting a drink. The decision tree is 
used to determine which one of these activities is currently occurring based on the sensor nodes 
that are firing in the kitchen. 


C4.5 was used for target recognition in an underwater wireless sensor surveillance system (Cayirci 
etal. 2006). Each node in the network was equipped with multiple microsensors of various types, 
including acoustic, magnetic, radiation, and mechanical sensors. The readings from these sensors 
were used by the decision tree recognition algorithms to classify submarines, small delivery vehicles, 
mines, and divers. 

C4.5 was also used as part of an algorithm to automatically recognize physical activities and their 
intensities (Tapia et al. 2007). The algorithm monitors the readings of triaxial wireless accelerom- 
eters and wireless heart rate monitors. The approach was evaluated using datasets consisting of 10 
physical gymnasium activities collected from a total of 21 people. 


1.1.2 Bayesian Network Classifiers 


Bayesian probability interprets the concept of probability as degree of belief: A Bayesian classifier 
analyzes the feature vector describing a particular input instance and assigns the instance to the 
most likely class. A Bayesian classifier is based on applying Bayes’ theorem to evaluate the likelihood 
of particular events. Bayes’ theorem gives the relationship between the prior and posterior beliefs 
for two events. In Bayes’ theorem, P(A) is the prior initial belief in A. P(A|B) is the posterior belief 
in A, after B has been encountered, i.e., the conditional probability of A given B. Similarly for B, 
P(B) is the prior initial belief in A, and P(B|A) is the posterior belief in B given A. Assuming that 
P(BIA)P(A) 
P(B) 4 0, Bayes’ theorem states that P(A|B) = PB) 

The Bayesian network is a probabilistic model that represents a set of random variables and 
their conditional dependencies via a direct acyclic graph. For example, a Bayesian network could 
represent the probabilistic relationships between activities and sensor readings. Given a set of sensor 
readings, the Bayesian network can be used to evaluate the probabilities that various activities are 
being performed. 

Bayesian networks have a number of advantages. Since a Bayes network relates only nodes that 
are probabilistically related by a causal dependency, an enormous saving of computation can result. 
Therefore, there is no need to store all possible configurations of states. Instead, all that needs to be 
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stored is the combinations of states between sets of related parent-child nodes. Also Bayes networks 
are extremely adaptable. They can be started off small, with limited knowledge about the domain, 
and grow as they acquire new knowledge. 

Bayes networks have been applied to a variety of sensor fusion problems, where data from various 
sources must be integrated in order to build a complete picture of the current situation. They 
have also been used in monitoring and alerting applications where the application should recognize 
whether specific events have occurred and decide if an alert or a notification should be sent. 
Further, they have been applied to a number of activity recognition applications and evaluated using 
numerous single- and multiple-resident home deployments. 

Bayesian networks can be divided into two groups, static and dynamic, based on whether they 
are able to model temporal aspects of the events/activities of interest. We introduce an example 
classifier for each of these two classes: static naive Bayes classifier and dynamic naive Bayes classifier. 


1.1.2.1 Static Bayesian Network Classifiers 


Avery commonly used representative of the static Bayesian networks is the static naive Bayes classifier. 
Learning Bayesian classifiers can be significantly simplified by making the naive assumption that the 
features describing a class are independent. The classifier makes the assumption that the presence or 
absence of a feature of a class is unrelated to the presence or absence of any of the other features in 
the feature vector. The naive Bayes classifier is one of the most practical learning methods, and it has 
been widely used in many sensor network applications, including activity recognition in residence for 
elders (van Kasteren and Króse 2007), activity recognition in the PlaceLab project at MIT (Logan 
et al. 2007), outlier detection (Janakiram et al. 2006), and body sensor networks (Maurer et al. 2006). 

Figure 1.6 shows a naive Bayesian model for the recognition of an activity. In this scenario, the 
activity at time t, activity. is independent of any previous activities. It is also assumed that the 
sensor data R, is dependent only on the activity;. 

Naive Bayes classifiers have the following advantages: 


They can be trained very efficiently. 

„ They are very well suited for categorical features. 

3. In spite of their naive design and the independence assumptions, naive Bayes classifiers have 
performed very well in many complex real-world situations. They can work with more than 
1000 features. 

4, They are good for combining multiple models and can be used in an iterative way. 


hu 


Figure 1.6 Static Bayesian network: activity, denotes the activity being detected at time t, and 
R; represents the data from sensor i at time t. 
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A disadvantage of naive Bayes classifiers is that, if conditional independence is not true, i.e., there is 
dependence between the features of the analyzed classes, they may not be a good model. Also naive 
Bayes classifiers assume that all attributes that influence a classification decision are observable and 
represented. Despite these drawbacks, experiments have demonstrated that naive Bayes classifiers 
are very accurate in a number of problem domains. Simple naive Bayes networks have even been 
proved comparable with more complex algorithms, such as decision trees (Tapia 2004). 


1.1.2.2 Dynamic Bayesian Network Classifiers 


Another disadvantage of static Bayesian networks is that they cannot model the temporal aspect 
of sensor network events. Dynamic Bayesian networks, however, are capable of representing a 
sequence of variables, where the sequence can be consecutive readings from a sensor node. Therefore, 
dynamic Bayesian networks, although more complex, might be better suited for modeling events 
and activities in sensor network applications. 

Figure 1.7 shows a naive dynamic Bayesian model, where the activity. variable is directly 
influenced only by the previous variable, activity,. The assumption with these models is that an 
event can cause another event in the future, but not vice versa. Therefore, directed arcs between 
events/activities should flow forward in time and cycles are not allowed. 

Dynamic models have been used in activity recognition applications. A naive dynamic Bayes clas- 
sifier is compared to a naive static Bayes classifier using two publicly available datasets (van Kasteren 
and Krése 2007). The dynamic Bayes classifier is shown to achieve higher activity recognition accu- 
racy than the static model. A dynamic Bayesian filter was successfully applied to the simultaneous 
tracking and activity recognition problem, which exploits the synergy between location and activity to 
provide the information necessary for automatic health monitoring (Wilson and Atkenson 2005). 


1.1.3 Markov Models 


A process is considered to be Markov if it exhibits the Markov property, which is the lack of 
memory, i.e., the conditional probability distribution of future states of the process depends only 
on the present state, and not on the events that preceded it. We discuss two types of Markov 
models: hidden Markov model (HMM) and hidden semi-Markov model (HSMM). 


1.1.3.1 Hidden Markov Model 


An HMM can be viewed as a simple dynamic Bayesian network. When using an HMM, the system 
is assumed to be a Markov process with unobserved (hidden) states. Even though the sequence of 


Figure 1.7 An example of a naive dynamic Bayesian network. 
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Figure 1.8 Hidden Markov model example. The states of the system Y; are hidden, but their 
corresponding outputs X; are visible. 


states is hidden, the output, which is dependent on the state, is visible. Therefore, at each time 
step, there is a hidden variable and an observable output variable. In sensor network applications, 
the hidden variable could be the event or activity performed and the observable output variable is 
the vector of sensor readings. 

Figure 1.8 shows an example HMM, where the states of the system Y are hidden, but the output 
variables X are visible. There are two dependency assumptions that define this model, represented 
by the directed arrows in the figure. 


1. Markov assumption: the hidden variable at time t, namely Y,, depends only on the previous 
hidden variable Y, 1 (Rabiner 1989). 
2. The observable output variable at time t, namely X,, depends only on the hidden variable Y.. 


With these assumptions, we can specify an HMM using the following three probability 
distributions: 


1. Initial-state distribution: the distribution over initial states p(Y]) 

2. Transition distribution: the distribution p(Y,|Y,+1), which represents the probability of going 
from one state to the next 

3. Observation distribution: the distribution p(X,|Y,), which indicates the probability that the 
hidden state Y, would generate observation X, 


Learning the parameters of these distributions corresponds to maximizing the joint probability 
distribution p(X, Y) of the paired observation and label sequences in the training data. Modeling 
the joint probability distribution p(X, Y) makes HMMs a generative model. 

HMMs have been extensively used in many sensor network applications. Most of the earlier 
work on activity recognition used HMMs to recognize the activities from sensor data (Patterson 
et al. 2005, Wilson and Atkenson 2005, van Kasteren et al. 2008). An HMM is also used in the 
smart thermostat project (Lu et al. 2010). The smart thermostat technology automatically senses 
the occupancy and sleep patterns in a home and uses these patterns to automatically operate the 
heating, ventilation, and air cooling (HVAC) system in the home. The authors employ an HMM 
to estimate the probability of the home being in each of three states: unoccupied, occupied and 
the residents are active, and occupied with the residents sleeping. HMMs were also applied in a 
biometric identification application for multi-resident homes (Srinivasan et al. 2010). In this project, 
height sensors were mounted above the doorways in a home and an HMM was used to identify 
the location of each of the residents. 
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Figure 1.9 Hidden semi-Markov model. Each hidden state y; is characterized by start position 
si and a duration dj. This means that the system is in state yi from time si to time si + di. 


A weakness of conventional HMMs is their lack of flexibility in modeling state durations. With 
HMMs, there is a constant probability of changing state, given that the system is in its current state 
of the model. This, however, limits the modeling capability. For example, the activity preparing 
dinner typically spans at least several minutes. To prepare dinner in less than a couple of minutes 
is not very usual. The geometric distribution used by HMMs to represent time duration cannot be 
used to represent event distributions where shorter durations are less possible. 


1.1.3.2 Hidden Semi-Markov Models 


An HSMM differs from an HMM in that HSMMs explicitly model the duration of hidden states. 
This means that the probability of there being a change in the hidden state depends on the amount 
of time that has elapsed since entry into the current state (Figure 1.9). 

A number of projects have used HSMMs to learn and recognize human activities of daily living 
(Zhang et al. 2008, Duong et al. 2009, van Kasteren et al. 2010). HSMMs were also applied to 
behavior understanding from video streams in a nursing center (Chung and Liu 2008). The proposed 
approach infers human behaviors through three contexts: spatial, activities, and temporal. HSMM 
were also used in a mobility tracking application for cellular networks (Mark and Zaidi 2002). 

The activity recognition accuracy achieved by HSMM is compared to that of HMM (van 
Kasteren et al. 2010). The authors evaluate the recognition performance of these models using two 
fully annotated real-world datasets consisting of several weeks of data. The first dataset was collected 
in a three-room single-resident apartment and the second dataset was from a six-room single-resident 
house. The results show that HSMM consistently outperforms HMM. This indicates that accurate 
duration modeling is important in real-world activity recognition applications as it can lead to 
significantly better performance. The use of duration in the classification process helps especially 
in scenarios where the sensor data does not provide sufficient information to distinguish between 
activities. 


1.1.4 Conditional Random Fields 


Conditional random fields (CRFs) are often considered an alternative to HMMs. The CRF is 
a statistical modeling method, which is a type of an undirected probabilistic graphical model 
that defines a single log-linear distribution over label sequences given a particular observation 
sequence. It is used to encode known relationships between observations and construct consistent 
interpretations. 
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Figure 1.10 A linear-chain CRF model. Similar to an HMM, the states of the system Y; are 
hidden, but their corresponding outputs X; are visible. Unlike the HMM model, however, the 
graph represented by the CRF model is undirected. 


The CRF model that most closely resembles an HMM is the linear-chain CRF. As Figure 1.10 
shows, the model of a linear-chain CRF is very similar to that of an HMM (Figure 1.8). The 
model still contains hidden variables and corresponding observable variables at each time step. 
However, unlike the HMM, the CRF model is undirected. This means that two connected 
nodes no longer represent a conditional distribution. Instead we can talk about potential between 
two connected nodes. In comparison with HMM, the two conditional probabilities, observation 
probability p(X,|Y_) and transition probability p(Y;|/Y,41), have been replaced by the corresponding 
potentials. The essential difference lies in the way we learn the model parameters. In the case of 
HMMs, the parameters are learned by maximizing the joint probability distribution p(X, Y). CREs 
are discriminative models. The parameters of a CRF are learned by maximizing the conditional 
probability distribution p(Y|X), which belongs to the family of exponential distributions (Sutton 
and McCailum 2006). 

CRF models have been applied to activity recognition in the home from video streams, in 
which primitive actions, such as go-from-A-to-B, are recognized in a laboratory-like dining room 
and kitchen setup (Truyen et al. 2005). The results from these experiments show that CRFs 
perform significantly better than the equivalent generative HMMs even when a large portion of 
the data labels are missing. CRFs were also used for modeling concurrent and interleaving activities 
(Hu et al. 2008). The authors perform experiments using one of the MIT PlaceLab datasets (Logan 
et al. 2007), PLA1, which consists of 4 hours of sensor data. 

T. van Kasteren et al. use four different datasets, two bathroom datasets and two kitchen datasets, 
to compare the performance of HMM to that of CRF (van Kasteren et al. 2010). The experiments 
show that, when applied to activity recognition tasks, CRF models achieve higher accuracy than 
HMM models. The authors contribute the results to the flexibility of discriminative models, such 
as CRF, in dealing with violations of the modeling assumptions. However, the higher accuracy 
achieved by CRF models comes at a price. 


1. Training discriminative models takes much longer than training their generative counterparts. 

2. Discriminative models are more prone to overfitting. Overfitting occurs when a model 
describes random noise instead of the underlying relationship. This happens when the model 
is trained to maximize its performance on the training data. However, a model’s efficiency is 
determined not by how well it performs on the training data but by its generality and how it 
performs on unseen data. 


Whether the improved recognition performance of CRFs is worth the extra computational cost 
depends on the application. The data can be modeled more accurately using an HSMM, which 
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Figure 1.11 An example semi-Markov CRF. Similar to an HSMM model, each of the hidden 
states y; is characterized by start position s; and a duration di. However, unlike an HSMM, the 
HMCRF graph is undirected. 


allows both speedy learning and good performance, and is less prone to overfitting. However, it 
does result in slower inference and depends on correct modeling assumptions for the durations. 


1.1.4.1 Semi-Markov Conditional Random Fields 


Similar to HMMs, which have their semi-Markov variants, CRFs also have a semi-Markov variant: 
semi-Markov conditional random fields (SMCRFs). An example SMCRE model is shown in 
Figure 1.11. The SMCRE inherits features from both semi-Markov models and CRES as follows: 


1. It models the duration of states explicitly (like HSMM). 
2. Each of the hidden states is characterized by a start position and duration (like HSMM). 
3. The graph of the model is undirected (like CRF). 


Hierarchical SMCRFs were used in an activity recognition application on a small laboratory dataset 
from the domain of video surveillance (Truyen et al. 2008). The task was to recognize indoor 
trajectories and activities of a person from the noisy positions extracted from the video. The data 
had 90 sequences, each of which corresponded to one of three possible activities: preparing a short 
meal, preparing a normal meal, and having a snack. The hierarchical SMCRF outperformed both 
a conventional CRF and a dynamic CRF. 

SMCREs were also used for activity recognition by van Kasteren et al. (2010). The results show 
that unlike the big improvement achieved by using HSMMs over HMMs, SMCRFs only slightly 
outperform CRFs. The authors attribute this result to the fact that CRFs are more robust in dealing 
with violations to the modeling assumptions. Therefore, allowing to explicitly model duration 
distributions might not have the same significant benefits as seen with HSMM. 


1.1.5 Support Vector Machines 


A support vector machine (SVM) is a non-probabilistic binary linear classifier. The output predic- 
tion of an SVM is one of the two possible classes. Given a set of training instances, each marked as 
belonging to one of the two classes, an SVM algorithm builds an N-dimensional hyperplane model 
that assigns future instances into one of the two possible output classes. 
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Figure 1.12 A two-dimensional SVM model. The instances of the two possible classes are 
divided by a clear gap. 


As shown in Figure 1.12, an SVM model is a representation of the input instances as points in 
space, mapped so that the instances of the separate classes are divided by a clear gap. New examples 
are then mapped into that same space and predicted to belong to a class based on which side of 
the gap they fall on. In other words, the goal of the SVM analysis is to find a line that separates 
the instances based on their class. There are an infinite number of possible lines, and one of the 
challenges with SVM models is finding the optimal line. 

SVMs have been applied to a large number of sensor network applications. Sathik et al. use 
SVMs in early forest fire detection applications (Mohamed Sathik et al. 2010). SVMs were also 
applied to target classification applications for distributed sensor networks (Li et al. 2001). The 
experiments were performed on real seismic and acoustic data. SVMs are compared to a k-nearest 
neighbor algorithm and a maximum likelihood algorithm and are shown to achieve the highest 
target classification accuracy. Tran and Nguyen use SVMs to achieve accurate geographic location 
estimations for nodes in a WSN, where the majority of nodes do not have effective self-positioning 
functionality (Tran and Nguyen 2008). SVMs were also applied to investigating the possibility of 
recognizing visual memory recall (Bulling and Roggen 2011). The project aims to find if people react 
differently to images they have already seen as opposed to images they are seeing for the first time. 


1.1.6 k-Nearest Neighbor Algorithms 


The k-nearest neighbor (k-NN) algorithm is among the simplest of machine learning algorithms, 
yet it has proven to be very accurate in a number of scenarios. The training examples are vectors 
in a multidimensional feature space, each with a class label. The training phase of the algorithm 
consists only of storing the feature vectors and class labels of the training samples. A new instance 
is classified by a majority vote of its neighbors, with the instance being assigned the class that is 
most common among its k nearest neighbors (Figure 1.13). 

The best choice of k depends upon the data. k must be a positive integer and it is typically small. 
If k = 1, the new instance is simply assigned to the class of its nearest neighbor. Larger values of 
k reduce the effect of noise on the classification but make boundaries between classes less distinct. 
A good k can be selected by various heuristic techniques, for example, cross-validation. 

Although the k-NN algorithm is quite accurate, the time required to classify an instance could 
be high since the algorithm has to compute the distances (or similarity) of that instance to all 
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Figure 1.13 Example of k-nearest algorithm classification. The question mark is the test sample, 
and it should be classified as either a star or a triangle. If k = 3, the test sample is assigned to the 
class of triangles because there are two triangles and one star inside the inner circle. If k = 7, 
the test sample is assigned to the class of stars since there are four stars and three triangles in 
the outer circle. 


the instances in the training set. Therefore, the classification time of k-NN is proportional to the 
number of features and the number of training instances. 

k-NN algorithms have been applied to a wide variety of sensor network applications. Ganesan 
etal. propose the use of k-NN for spatial data interpolation in sensor networks (Ganesan et al. 2004). 
Due to its simplicity, k- NN allows the sampling to be done in a distributed and inexpensive manner. 
A disadvantage with this approach, however, is that k-NN interpolation techniques might perform 
poorly in highly irregular settings. Winter et al. also analyze the application of k-NN queries for 
spatial data queries in sensor networks (Winter et al. 2005). They design two algorithms based on 
k-NN, which are used to intelligently prune off irrelevant nodes during query propagation, thus 
reducing the energy consumption while maintaining high query accuracy. Duarte and Hu evaluate 
the accuracy of k-NN in the context of vehicle classification (Duarte and Hu 2004). The authors 
collect a real-world dataset and analyze both the acoustic and the seismic modality. The results 
show that in this application scenario, k-NN algorithms achieve comparable accuracy to that of 


SVMs. 


1.2 Unsupervised Learning 


Collecting labeled data is resource and time consuming, and accurate labeling is often hard to 
achieve. For example, obtaining sufficient training data for activity recognition in a home might 
require 3 or 4 weeks of collecting and labeling data. Further, labeling is difficult not only for remote 
areas that are not easily accessible, but also for home and commercial building deployments. For 
any of those deployments, someone has to perform the data labeling. In a home deployment, the 
labeling can be done by the residents themselves, in which case they have to keep a log of what they 
are doing and at what time. Previous experience has shown that these logs are often incomplete 
and inaccurate. An alternative solution is to install cameras throughout the house and monitor the 
activities of the residents. However, this approach is considered to be privacy-invasive and therefore 
not suitable. 
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In unsupervised learning, the learner is provided with input data, which has not been labeled. 
The aim of the learner is to find the inherent patterns in the data that can be used to determine 
the correct output value for new data instances. The assumption here is that there is a structure to 
the input space, such that certain patterns occur more often than others, and we want to see what 
generally happens and what does not. In statistics, this is called density estimation. 

Unsupervised learning algorithms are very useful for sensor network applications for the 
following reasons: 


m Collecting labeled data is resource and time-consuming. 
Accurate labeling is hard to achieve. 
Sensor networks applications are often deployed in unpredictable and constantly changing 
environments. Therefore, the applications need to evolve and learn without any guidance, 
by using unlabeled patterns. 


A variety of unsupervised learning algorithms have been used in sensor network applications, 
including different clustering algorithms, such as k-means and mixture models; self-organizing 
maps (SOMs); and adaptive resonance theory (ART). In the rest of this section, we describe some 
of the most commonly used unsupervised learning algorithms. 


1.2.1 Clustering 


Clustering, also called cluster analysis, is one form of unsupervised learning. It is often employed 
in pattern recognition tasks and activity detection applications. A clustering algorithm partitions 
the input instances into a fixed number of subsets, called clusters, so that the instances in the same 
cluster are similar to one another with respect to some set of metrics (Figure 1.14). 

Cluster analysis itself is not one specific algorithm, but the general task to be solved. The 
clustering can be achieved by a number of algorithms, which differ significantly in their notion of 
what constitutes a cluster and how to efficiently find them. The choice of appropriate clustering 
algorithms and parameter settings, including values, such as the distance function to use, a density 
threshold, or the number of expected clusters, depends on the individual dataset and intended use 
of the results. 
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Figure 1.14 A clustering algorithm divides the set of input data instances into groups, called 
clusters. The instances in the same group are more similar to each other than to those in other 
clusters. 
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The notion of a cluster varies between algorithms and the clusters found by different algorithms 
vary significantly in their properties. Typical cluster models include the following: 


m Connectivity models: An example of a connectivity model algorithm is hierarchical clustering, 
which builds models based on distance connectivity. 

m Centroid models: A representative of this set of algorithms is the k-means algorithm. With 
this algorithm, each cluster is represented by a single mean vector. 

m Distribution models: Clusters are modeled using statistics distributions. 

m Density models: An example of density model clustering is density-based spatial clustering for 
applications with noise (DBSCAN). In this type, clusters are identified as areas with higher 
density than non-clusters. 

m Group models: These clustering algorithms are not able to provide a refined model for the 
results. Instead, they can only generate the group information. 


We discuss in more detail two of the most common clustering algorithms used in sensor network 
applications: k-means clustering and DBSCAN clustering. 


1.2.1.1 k-Means Clustering 


The goal of k-means clustering is to partition the input instances into k clusters, where each instance 
belongs to the cluster with the nearest mean. Since the problem is NP-hard, the common approach 
is to search only for approximate solutions. There are a number of efficient heuristic algorithms 
that can quickly converge to a local optimum, such as the Lloyd’s algorithm (Lioyd 1982). Since 
the algorithms find only local optimums, they are usually run multiple times with different random 
initializations. 

An advantage of the k-means algorithm is that it is simple and converges quickly when the 
number of dimensions of the data is small. However, k-means clustering also has a number of 
drawbacks. First, k must be specified in advance. Also the algorithms prefer clusters of approximately 
similar sizes. This often leads to incorrectly cut borders in between clusters, which is not surprising 
since, being a centroid model algorithm, k-means optimizes for cluster center rather than cluster 
borders. 

Figure 1.15 shows a clustering example, where k = 2 and k-means is not able to accurately define 
the borders between the two clusters. There are two density clusters in that figure. One of them is 
much larger and contains circles. The other one is smaller and consists of triangles. Since k-means 
optimizes for cluster center and tends to produce clusters with similar sizes, it incorrectly splits the 
data instances into a dark and a light cluster. These two clusters, however, do not overlap with the 
original density clusters of the input data. 

k-Means clustering has been used in a number of WSN applications. A k-means algorithm is 
used in the fingerprint and timing-based snooping (FATS) security attack to cluster together sensors 
that are temporally correlated (Srinivasan et al. 2008). This allows the attack to identify sensors 
that fire together and hence identify sensors that are located in the same room. k-Means clustering 
has also been used to address the multiple sink location problem in large-scale WSNs (Oyman and 
Ersoy 2004). In large-scale networks with a large number of sensor nodes, multiple sink nodes 
should be deployed not only to increase the manageability of the network but also to prolong the 
lifetime of the network by reducing the energy dissipation of each node. Al-Karaki et al. apply 
k-means clustering to data aggregation and more specifically to finding the minimum number of 
aggregation points in order to maximize the network lifetime (Al-Karaki et al. 2004). The results 
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Figure 1.15 k-Means clustering might incorrectly cut the borders between density-based 
clusters. 


from their experiments show that, compared to a number of other algorithms, such as a genetic 
algorithm and a simple greedy algorithm, k-means clustering achieves the highest network lifetime 
extension. 


1.2.1.2 DBSCAN Clustering 


The DBSCAN is the most popular density-based clustering algorithm. In density-based clustering, 
clusters are defined as areas of higher density than the remainder of the dataset. DBSCAN requires 
two parameters: distance threshold (Eps-neighborhood ofa point) and minimum number of points 
required to form a cluster (MinPts) (Ester et al. 1996). DBSCAN is based on connecting points 
within a certain distance of each other, i.e., points that are in the same Eps-neighborhood. However, 
in order to make a cluster, DBSCAN requires that for each point in the cluster, there are at least 
MinPts in the Eps-neighborhood. Figure 1.16 shows an example of DBSCAN clustering. The 
dataset is the same as that in Figure 1.15, but since a density-based clustering algorithm has been 
used, the data is clustered correctly. 

An advantage of DBSCAN is that, unlike many other clustering algorithms, it can form clusters 
of any arbitrary shape. Another useful property of the algorithm is that its complexity is fairly low 
and it will discover essentially the same clusters in each run. Therefore, in contrast to k-means 
clustering, DBSCAN can be run only once rather than multiple times. The main drawback of 
DBSCAN is that it expects sufficiently significant density drop in order to detect cluster borders. If 
the cluster densities decrease continuously, DBSCAN might often produce clusters whose borders 
look arbitrary. 

In sensor network applications, DBSCAN has been used as part of the FATS security attack 
to identify the function of each room, such as bathroom, kitchen, or bedroom (Srinivasan 
et al. 2008). DBSCAN generates temporal activity clusters, each of which forms a continuous 
temporal block with a relatively high density of sensor firings. Experiments show that DBSCAN 
performs very well because it automatically leaves out outliers and computes high-density clusters. 
However, when DBSCAN is applied to the step of identifying which sensors are in the same 
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Figure 1.16 An example density-based clustering with DBSCAN. 


room, k-means clustering performs much better. This is especially true for scenarios where all 
devices are highly correlated temporally and there is no significant density drop on the boundary of 
clusters. 

Apiletti et al. also apply DBSCAN to detecting sensor correlation (Apiletti et al. 2011). The 
authors perform experiments using data collected from a sensor network deployed in university 
laboratories. The results show that DBSCAN is able to identify different numbers of clusters based 
on which day of the week it is analyzing. This allows it to construct more accurate models for the 
sensor use patterns in the laboratories. DBSCAN also successfully detects noisy sensors. 


1.2.2 Self-Organizing Map 


SOMs provide a way of representing multidimensional data in much lower dimensional spaces— 
typically one or two dimensions. The process of reducing the dimensionality of the feature vectors 
is a data compression technique known as vector quantization. SOMs, as indicated by their name, 
produce a representation of the compressed feature space, called a map. An extremely valuable 
property of these maps is that the information is stored in such a way that any topological 
relationships within the training set are maintained. 

A SOM contains components called nodes. Each node is associated with (1) a position in the 
map space and (2) a vector of weights, where the dimension of this vector is the same as that of the 
input data instances. The nodes are regularly spaced in the map, which is typically a rectangular or 
a hexagonal grid. A typical example of SOMs is a color map (Figure 1.17). Each color is represented 
by a three-dimensional vector containing values for red, green, and blue. However, the color SOM 
represents the colors in a two-dimensional space. 

The procedure of placing an input data instance onto the map is the following; 


1. Initialize the weights of the nodes on the map. 

2. Choose an input training instance. 

3. Find the node with the closest vector to that of the input instance. This node is called the 
best matching unit (BMU). 
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Figure 1.17 An example SOM representation for colors. 


4. Calculate the radius of the BMU's neighborhood. This value is often set to the radius of 
the whole map, but it decreases at each time step. Any node found within this radius is 
considered to be inside the BMU’s neighborhood. 

5. Once the BMU is located, it is assigned the values from the vector of the input instance. 
In addition, the weights of the nodes close to the BMU are also adjusted towards the input 
vector. The closer a neighbor node is to the BMU, the more its weight is altered. 


In sensor networks, SOMs have been applied to anomaly detection caused by faulty sensors and 
unusual phenomenon, such as harsh environmental conditions (Siripanadorn et al. 2010). Paladina 
et al. have also used SOMs for node localization (Paladina et al. 2007). Their localization technique 
is based on a simple SOM implemented on each of the sensor nodes. The main advantages of this 
approach are the limited storage and computing cost. However, the processing time required by 
the SOMs increases with the size of the input data. Giorgetti et al. have also applied SOMs to 
addressing node localization (Giorgetti et al. 2007). Their SOM-based algorithm computes virtual 
coordinates that are used in location-aided routing. If the location information for a few anchor 
nodes is available, the algorithm is also able to compute the absolute positions of the nodes. The 
results from the experiments further show that the SOM-based algorithm performs especially well 
for networks with low connectivity, which tend to be harder to localize, and in the presence of 
irregular radio patterns or anisotropic deployment. A variation of a SOM, called growing self- 
organized map, is employed to achieve accurate detection of human activities of daily living within 
smart home environments (Zheng et al. 2008). 


1.2.3 Adaptive Resonance Theory 


Most existing learning algorithms are either stable (they preserve previously learned information) or 
plastic (they retain the potential to adapt to new input instances indefinitely). Typically, algorithms 
that are stable cannot easily learn new information and algorithms that are plastic tend to forget 
the old information they have learned. This conflict between stability and plasticity is called the 
stability-plasticity dilemma (Carpenter and Grossberg 1987). 

The ART architectures attempt to provide a solution to the stability—plasticity dilemma. ART is 
a family of different neural architectures that address the issue of how a learning system can preserve 
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its previously learned knowledge while keeping its ability to learn new patterns. An ART model is 
capable of distinguishing between familiar and unfamiliar events, as well as between expected and 
unexpected events. 

An ART system contains two functionally complementary subsystems that allow it to process 
familiar and unfamiliar events: attentional subsystem and orienting subsystem. Familiar events are 
processed within the attentional subsystem. This goal of this subsystem is to constantly establish 
even more precise internal representations of and responses to familiar events. By itself, however, 
the attentional subsystem is unable to simultaneously maintain stable representations of familiar 
categories and to create new categories for unfamiliar events. This is where the orienting subsystem 
helps. It is used to reset the attentional subsystem when an unfamiliar event occurs. The orienting 
subsystem is essential for expressing whether a novel pattern is familiar and well represented by an 
existing recognition code, or unfamiliar and in need of a new recognition code. 

Figure 1.18 shows the architecture of an ART system. The attentional system has two successive 
stages, Fı and F2, which encode patterns of activation in short-term memory. The input pattern 
is received at F¡, and the classification is performed at F2. Bottom-up and top-down pathways 
between the two stages contain adaptive long-term memory traces. The orienting subsystem 
measures the similarity between the input instance vector and the pattern produced by the fields 
in the attentional subsystem. If the two are similar, i.e., if the attentional subsystem has been able 
to recognize the input instance, the orienting subsystem does not interfere. However, if the two 
patterns are significantly different, the orienting subsystem resets the output of the recognition 
layer. The effect of the reset is to force the output of the attentional system back to zero, which 
allows the system to search for a better match. 

A drawback of some of the ART architectures is that the results of the models depend significantly 
on the order in which the training instances are processed. The effect can be reduced to some extent 
by using a slower learning rate, where differential equations are used and the degree of training on 
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Figure 1.18 The architecture of an ART system has two subsystems: attentional, responsible 
for processing familiar events, and orienting, which helps reset the attentional subsystem when 
an unfamiliar event occurs. The attentional subsystem contains a comparison field, where the 
input is received, and a recognition field, which assigns the input to a category. Both short-term 
memory (STM) and long-term memory (LTM) are employed. 
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an input depends on the time the input is available. However, even with slow training, the order 
of training still affects the system regardless of the size of the input dataset. 

ART classifiers have been applied to WSN applications to address anomaly detection problems 
in unknown environments (Li et al. 2010). A fuzzy ART classifier is used to label multidimensional 
sensor data into discrete classes and detect sensor-level anomalies. An ART classification is also 
employed by an intruder detection system that uses a WSN and mobile robots (Li and Parker 2008). 
The sensor network uses an unsupervised fuzzy ART classifier to learn and detect intruders in a 
previously unknown environment. Upon the detection of an intruder, a mobile robot travels to 
investigate the position where the intruder is supposed to be. Kulakov and Davcev incorporate 
ART into a technique used for detection of unusual sensor events and sensor failures (Kulakov and 
Davcev 2005). Through simulation, where one of the input sensor nodes is failed on purpose, the 
authors show the improvement in data robustness achieved by their approach. 


1.2.4 Other Unsupervised Machine Learning Algorithms 


There is a wide variety of unsupervised learning algorithms, in addition to k-means clustering, 
DBSCAN, SOM, and ART, which have been often applied to WSN application. The SmartHouse 
project uses a system of sensors to monitor a person’s activities at home (Barger et al. 2005). The 
goal of the project is to recognize and detect different behavioral patterns. The authors use mixture 
models to develop a probabilistic model of the behavioral patterns. The mixture model approach 
serves to cluster the observations with each cluster considered to be a different event type. 

A number of activity recognition projects have developed unsupervised learning algorithms that 
extract models from text corpora or the web. The Guide project uses unsupervised learning methods 
to detect activities using RFID tags placed on objects (Philipose et al. 2003). This method relies 
on data mining techniques to extract activity models from the web in an unsupervised fashion. For 
this project, the authors have mined the temporal structure of about 15,000 home activities. 

Gu et al. develop another unsupervised approach based on RFID-tagged object-use fingerprints 
to recognize activities without human labeling (Gu et al. 2010). The activity models they use are 
built based on object-use fingerprints, which are sets of contrast patterns describing significant 
differences in object use between any two activity classes. This is done by first mining a set of object 
terms for each activity class from the web and then mining contrast patterns among object terms 
based on emerging patterns to distinguish between any two activity patterns. 

Wyatt et al. also employ generic mined models from the web (Wyatt et al. 2005). Given an 
unlabeled trace of object names from a user performing their activities of daily living, they use the 
generic mined models to segment the trace into labeled instances of activities. After that, they use 
the labeled instances to learn custom models of the activity from the data. For example, they learn 
details such as order of object use, duration of use, and whether additional objects are used. 

Tapia et al. develop a similar approach where they extract relevant information on the functional 
similarity of objects automatically from WordNet, which is an online lexical reference system for 
the English language (Tapia et al. 2006). The information about the functional similarity among 
objects is represented in a hierarchical form known as ontology. This ontology is used to help mitigate 
the problem of model incompleteness, which often affects the techniques used to construct activity 
recognition models. 

An unsupervised approach based on detecting and analyzing the sequence of objects that are 
being used by the residents is described in Wu et al. (2007). The activity recognition method is 
based on RFID object use correlated with video streams and information collected from how-to 
websites such as about.com. Since video streams are used, the approach provides high-grained 
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activity recognition. For example, it can differentiate between making tea and making coffee. 
However, as previously mentioned, collecting video data of home activities is difficult due to 
privacy concerns. 

Dimitrov et al. develop a system that relies on unsupervised recognition to identify activities of 
daily living in a smart home environment (Dimitrov et al. 2010). The system utilizes background 
domain knowledge about the user activities, which is stored in a self-updating probabilistic knowledge 
base. The system aims to build the best possible explanation for the observed stream of sensor events. 


1.3 Semi-Supervised Learning 


Semi-supervised learning algorithms use both labeled and unlabeled data for training. The labeled 
data is typically a small percentage of the training dataset. The goal of semi-supervised learning is 
to (1) understand how combining labeled and unlabeled data may change the learning behavior 
and (2) design algorithms that take advantage of such a combination. Semi-supervised learning is 
a very promising approach since it can use readily available unlabeled data to improve supervised 
learning tasks when the labeled data is scarce or expensive. 

There are many different semi-supervised learning algorithms. Some of the most commonly 
used ones include the following; 


m Expectation—maximization (EM) with generative mixture models: EM is an iterative method 
for finding maximum likelihood estimates of parameters in statistical models, where the 
models depend on unobserved latent variables (Dempster et al. 1977). Each iteration of the 
algorithm consists of an expectation step (e-step) followed by a maximization step (m-step). 
EM with generative mixture models is suitable for applications where the classes specified 
by the application produce well-clustered data. 

m Self-training: Self-training can refer to a variety of schemes for using unlabeled data. Ng 
and Cardie implement self-training by bagging and majority voting (Ng and Cardie 2003). 
An ensemble of classifiers is trained on the labeled data instances and then the classifiers 
are used to classify the unlabeled examples independently. Only those examples, for which 
all classifiers assign the same label, are added to the labeled training set, and the classifier 
ensemble is retrained. The process continues until a stop condition is met. 

A single classifier can also be self-trained. Similar to the ensemble of classifiers, the single 
classifier is first trained on all labeled data. Then the classifier is applied to the unlabeled 
instances. Only those instances that meet a selection criterion are added to the labeled set 
and used for retraining. 

m Co-training: Co-training requires two or more views of the data, i.e., disjoint feature sets that 
provide different complementary information about the instances (Blum and Mitchell 1998). 
Ideally, the two feature sets for each instance are conditionally independent. Also each feature 
set should be sufficient to accurately assign each instance to its respective class. The first step 
in co-training is to use all labeled data and train a separate classifier for each view. Then, 
the most confident predictions of each classifier are used on the unlabeled data to construct 
additional labeled training instances. Co-training is a suitable algorithm to use if the features 
of the dataset naturally split into two sets. 

m Transductive SVMs: Transductive SVMs extend general SVMs in that they could also use 
partially labeled data for semi-supervised learning by following the principles of transduction 
(Gammerman et al. 1998). In inductive learning, the algorithm is trained on specific training 
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instances, but the goal is to learn general rules, which are then applied to the test cases. By 
contrast, transductive learning is reasoning from specific training cases to specific testing 
cases. 

m Graph-based methods: These are algorithms that utilize the graph structure obtained by cap- 
turing pairwise similarities between the labeled and unlabeled instances (Zhu 2007). These 
algorithms define a graph structure where the nodes are labeled and unlabeled instances, and 
the edges, which may be weighted, represent the similarity of the nodes they connect. 


In sensor networks, semi-supervised learning has been applied to localization of mobile objects. 
Pan et al. develop a probabilistic semi-supervised learning approach to reduce the calibration effort 
and increase the tracking accuracy of their system (Pan et al. 2007). Their method is based on 
semi-supervised CREs, which effectively enhance the learned model from a small set of training 
data with abundant unlabeled data. To make the method more efficient, the authors employ 
a Generalized EM algorithm coupled with domain constraints. Yang et al. use a semi-supervised 
manifold learning algorithm to estimate the locations of mobile nodes in a WSN (Yang et al. 2010). 
The algorithm is used to compute a subspace mapping function between the signal space and the 
physical space by using a small amount of labeled data and a large amount of unlabeled data. 

Wang et al. develop a semi-supervised learning algorithm based on SVM (Wang et al. 2007). 
The algorithm has been applied to target classification, and the experimental results show that it can 
accurately classify targets in sensor networks. 

Semi-supervised learning has also been applied to object detection and recognition of commonly 
displaced items. Xie et al. propose a dual-camera sensor network that can be used as memory 
assistant tool (Xie et al. 2008). Their approach extracts the color features of every new object 
and then uses a semi-supervised clustering algorithm to classify the object. The user is provided 
with the option to review the results of the classification algorithm and label images that have 
been mislabeled, thus providing real-time feedback to the system to refine the data model of the 
semi-supervised clustering. 


1.4 Summary 


Machine learning has been steadily entering the area of sensor network applications. Since its 
application to routing problems in wireless networks as early as 1994 (Cowan et al. 1994), it has 
been used to address problems, such as activity recognition, localization, sensor fusion, monitoring 
and alerting, outlier detection, energy efficiency in the home, to name a few. Future work will further 
extend both the application domains and the set of machine learning techniques that are used. 
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2.1 Introduction 


During preprocessing event log, training samples are sorted by their area of fire damage according 
to high, medium, small, and accidental small fire classes. The preprocessing step allows studying 
the probability distribution and in our case the samples are highly skewed, giving an estimate that 
accidental small fires are more likely compared to large fires. Statistically we are interested in the 
many factors that influence accidental small fires; from the training samples, it follows a normal 
distribution. Graphically describing normal distribution, it takes the form of a bell-shape curve, 
which is also known as the Gaussian function. This is a first approximation for a real-valued random 
variable, which tends to cluster around a single mean value. The sensor model needs to learn the 
expected ranges for the baseline attributes being measured, giving better density estimation with 
increasing samples count. The baseline discrete parameters capture only the sensor ranges, making 
event prediction function hard to train with a Gaussian density function, without specific temporal 
understanding of the datasets. The dynamic features present in a sequence of patterns are localized 
and used to predict events, which otherwise may not be an attributing feature to the static data 
mining algorithm. 

We have studied the spatial features and baseline discrete sensor measurements and all the 
attributes available that have a high classification error [1], which can sometimes amount to 50% 
of the error in the case of accidental small fire category. To further investigate factors that can cause 
such fires, we include data pertaining to human specific temporal attributes such as number of 
visitors and traffic patterns coming into the forest area, thus filtering events with local significance. 
Temporal attributes is a better estimator, given the type of training samples, which are difficult 
to calibrate and any approximation may induce false alarms. Relevance-based ranking function is 
highly suitable to order higher bound sample chosen by domain experts as ideal estimates and still 
maintaining the desired low false alarm rates. The method of ranking uses the function of sensor 
precision and event relevance weights, which are then linearly added to represent data from fire 
activity logs. The rest of the chapter is organized as follows. Sections 2.2 and 2.3 provide related 
work and state of the art. Section 2.4 defines sensor measurements and fire activity to model the 
data and algorithm computational complexity. Section 2.5 defines the additive ranking weight 
to classify accidental small fires using precision and relevance of the event collection. Section 2.6 
discusses the performance of the large fire ranking function in terms of Fire Weather Index (FWI) 
[2] attributes and uses precision and discusses error rate in terms of false alarms. Sections 2.7 and 2.8 
discuss preliminary information retrieval (IR) [3] choices with extensive simulation using Machine 
Learning (ML) probabilistic algorithms. Section 2.9 uses WEKA [4,5] tool to automatically do 
attribute selection ranking, this allows to a good statistical knowledge for weather data, and in 
Section 2.10, the constraints related to machine learning model with respect to weather data and 
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how to use Bayes net are discussed. Future work in Section 2.11 discusses massive datasets and 
how to computationally use a new framework and port ML algorithms. The chapter concludes in 
Section 2.12 with summary of results performance. 


2.2 Background 


The machine learning repository [6] provides collection of supervised databases that are used for 
the empirical analysis of event prediction algorithms with unsupervised datasets from distributed 
wireless sensor networks. Sensor network generate huge amounts of data that need to be validated 
for its relevance; keeping only necessary data helps avoid high computational overloads due to data 
redundancies. Calibration of sensors may not be always possible and the data aggregating algorithms 
need to have domain rules to detect any outliers from the datastream, given other parameters are 
kept constant. Given a dataset of forest fire events for a region, the training algorithm will be able to 
transform correlated attributes from sensor networks datastream to validate and classify the events 
and reject the outliers reliably. The preliminary work models the empirical data with a ranking 
function without spatial information to predict the likelihood of different events. The concept of 
IR, such as precision and relevance, is used in context to sensor networks, which not only allow to 
understand the domain topics but also add high reliability to the large dataset. In this case study, we 
use forest fire and environmental conservation as the theme and study which environmental factors 
attribute to such events, for example, temporal attributes such as humans, peak weather conditions, 
surface fuel buildups, and wind spread factors. The Burnt Area (BA), which is the ground truth 
is broadly studied with respect to small and large fires. The framework needs to extend queries in 
topics that are spatially aware, making sensing an essential source of discovering information. 


2.3 State of the Art 


The sensitivity of a sensor network not only depends on the accuracy of individual sensors but also 
on the spatial distribute collaborative sensing of neighboring sensors. Now we can define the data 
mining criteria, which is a standard measure to evaluate queries in a sensor networks. 

Precision How close a features measurement match at every independent value, when measured 
in time. 

Accuracy How redundant a basic feature sampling measurement assumption are, due to the 
fact the independent and identically distributed (i.i.d) statistical for measuring correlated spatially 
arranged sensors do not hold good for medium to large weather data sets (e.g. Confusion matrix 
shown Table 2.1 and the formula for ranking precision and accuracy are given below). The more 
standard form of precision and recall is given in Ranking section in 2.1.1. 


Table 2.1 Confusion Matrix of 
Classifier 


Positive | Negative 


Positive tp fn 


Negative fp tn 
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Figure 2.1 Bayes Net representation. 


ty ttn 
accuracy = 
precision = ’p 

tp + fp 


2 x Precision x Recall 


Eg = = 
Precision + Recall 

The first criteria, which is precision, cannot be generalized in the case of weather data, as weather 
data are not independent and identically distributed (i.i.d.), but happens to be highly correlated 
feature. As most of the generative models like Naive Bayes assume i.i.d. strictly, one needs to 
find alternative models, which better approximate the observed samples. A better model using 
conditional probabilities compared to the basic Nave Bayes is its network dependent version 
called Bayes Net. This model allows to specify conditional relationships that are dependent and 
independent of the observed phenomena respectively. A typical Bayes Net is illustrated in Figure 
2.1, which identifies entities such as small, large forest fires and its dependencies. The multi-feature 
allows to analyze the general pattern in weather data, and local events are further studied by 
including temporal dependencies in the form of human traffic patters. The enhanced model not 
only improves the statistical assumptions [7], it also performs much better in error handling, which 
is one of the major disadvantage of large low-resource energy constrained sensor networks. In this 
work, one of the goals is to better estimate the performance of the underlying algorithms used. To 
illustrate the need for such a metric we use three algorithms and measure its precision and accuracy 
as shown in Table 2.2. The average is calculated by combining precision and accuracy. Algorithm 
III has the highest average even though it has very low precision; to overcome this limitation we 
adapt a weighted average, which is also called the F-score. The calculated value of F-score is shown 
in the final column for all the three algorithms, and now using F-score the weights do properly 
rank the algorithm HI as the lowest due to its low precision value. 

Adding sensor data to the manual forest fire event logs allows the study of automated correlated 
real-time information, which properly trained allows estimation and classification of future sensor 
outputs from the same geographical region. Event logs contain spatial information such as GPS 
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Table 2.2 Mean Comparison of Precision and Recall 


Precision | Recall | Average | F-score 
Algorithm | 0.5 0.4 0.45 0.444 
Algorithm II 0.7 0.1 0.4 0.175 
Algorithm III 0.02 1.0 0.51 0.0392 


and the area of the fire-damaged region but lack correlated information which could lead to better 
estimation of the fire event under study. Moreover, the spatial information is collected manually 
and takes considerable amount of time to classify it, while the sensor measurements are measured 
in real-time and can be approximated by machine learning algorithms to classify events and notify 
if a fire alarm condition has reached. 

Ranking functions allows filtering unrelated data and present only relevant information to the 
user's query [8-10]. In the IR domain, there are many efficient ways to rank the relevance of 
a document in a collection given a user query. Similarly, we like to rank the order of an event 
occurring given the precision and relevance of the prior fire probabilities and a hypothesis. The 
terms defined as precision and relevance are inversely proportional. In problems where the recorded 
evidence is small and rare, one can use a precision scale instead to rank the evidence and making 
other correlated events as relevant balancing out the summed weights used in events ranking. We 
rank the precision weights higher whenever fire events occur along with higher alarm conditions. 
The high alarm condition is always true for precision ranking and holds across accidental small, 
medium, and large fires equally, unlike relevance, which accounts for majority of fires including 
only the accidental small fire ones. The precision ranking uses ground truth such as actual fire 
evidence, further eliminating any possibility of errors due to outliers and weak evidences. These 
ground truths are well-established naturally occurring phenomenon, which occur rarely and leave 
significant evidence of the BAs in meter square of the forest. A relevance ranking weight does an 
exhaustive search of all the prior fire events, which makes it unique to the natural habitat of the 
particular geographical area, where the concept is machine learnt. 


2.4 Sensor Measurement and Fire Activity 


The samples of the fire events are sorted in terms of BA in hectares(ha), the frequency distribution 
versus BA is shown in Figure 2.2. We can model the BA in terms of a function as shown in 
Equation 2.1, which allows us to study the behavior of fire activities over time and predict newer 
events reliably. 


BA = f (x) (2.1) 


2.4.1 Sorting by BA (ha) 


The histogram shows that the BA is skewed with large number of small fires and very few large 
fires, making likelihood of small fires more predictable. We further classify fires into four categories 
that allows to select the performance of the system in terms of precision and relevance. Figure 2.2b 
shows the new classes without any other weather attribute that is likely to cause the events. 
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Figure 2.2 Histograms of empirical samples. (a) Empirical log collection and (b) four-class 
classification. 


2.4.2 Sorting by BA with Temporal Attributes 


Verain (FireEventpay of week) — (2.2) 
VF ireEventpay of week) Temporal Variable + 


V (FireEventpay of week) Correlated measurements 


Temporal Variables = Month of the year + Day of the week (2.3) 
Correlated measurement = temperature + humidity + wind + rain (2.4) 
Classifiers = faccidental; small, medium, large} (2.5) 


2.4.3 Estimating Training Values with Sample Data 


Sample datasets are based on 517 Fire Location rules from UCI forest fire repository to classify 
fire activity for a geographical area. The equation representing the target function of BA from the 
empirical data is given in Equation 2.1. In our case, the hypothesis to be maximized in terms of the 
temporal attributes are given in Equation 2.2 for a four-class classification, as given in Equation 
2.5 and according to Equation 2.1. The assumption here is that the training set D is an unbiased 
representation to learn the concept c and can estimate the inputs x;. The previously defined 
dependent variable Fire Location, which is used to estimate given the independent correlated 
measurements and its relation to the temporal attributed are given in Equations 2.4 through 2.6. 
The target concepts are present in the training samples and we like to see the influence of adding 
unlabelled sensor measurements to further accurately learn the concepts of the human-induced 
accidental small fires versus the more natural accruing types of the medium and large fires. For the 
sake of clarity of machine learning domain, we convert the correlated sensor data to ordinal [11,12] 
types, as illustrated in the following: 


temperature = {cool; mild; hot} (2.6) 
humidity = {normal; high} (2.7) 
wind = {true; false} (2.8) 
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The model estimation of the the target function with weights w1, w2 as shown allows to minimize 
the training error, where x1, x2 are temporal and correlated measurements. 


V = 1x1] + w2X2 (2.9) 


The learning algorithm needs to define the best fit for the given hypothesis and adjust the weights 


to minimize the error-and-misclassifications. 


E= Y Viña (FireEvent) — V (FireEveno) y? (2.10) 


2.4.4 Algorithm Complexity 


Search space consists of all the possible patterns of the features, given our data model, 4x3x3 = 36 
possibilities for each rule when using attributes temperature, humidity, and wind. As there are 517 
rules from the collected dataset instances, each rule can have 36 possibilities and the complete 
search space will have 36°!” different possibilities. To minimize the complexity of search space, 
we can further cut down on the sample instances by using spatial clustering and removing any 
redundancies in similar features. Given the (X, Y) positions, we can cluster into groups the possible 
fire types into accidental small fires and others that have medium and larger BA as large fires. As 
measuring ambient phenomena are correlated we expect clustering would be best suited. Let us 
take five clusters to contain all the samples, then the search space reduces to 36> = 60 x 10° 
possible rule sets. These methods are used with preprocessing to reduce redundancies in the model 
and are very practical optimizations of machine learning algorithms. To judge the effectiveness of 
the model and the classification effectiveness, we initially rely on real-valued numeric model such 
as [13] to estimate the errors. In contrast to the previous approach, we use ordinal values as defined 
in Equations 2.8 through 2.11 to build a tree classifier and further reduce errors. 


2.5 Alarm Ranking Function for Accidental Small Fires 


In our example event data log such as forest fires, which may be incomplete and how do we infer 
knowledge from the missing datasets. Relevance factor of fire event learning concept can be defined 
as all fire events that are tagged in the log, as we are interested in reporting fire occurrences. The 
inverse concept precision is seen from the histogram plot in Figure 2.2a where there are very few 
large fires. When reporting on major fire events, the highest ranked samples are retrieved that has 
a higher corresponding rank leading to a precision learning concept close to 1. 


2.5.1 Ranking Function 


The design of a good ranking function needs to balance the relevance and precision of the events 
in a way to express a summable numerical quantity, which signifies the importance of the new 
sample and how reliable the prior probabilities were, as shown in Table 2.1 with false positives. 
As the ranking functions are evaluated for a given query, we define the query criteria for retrieving 
accidental small fires and large wild fires for our collection. 


number of relevant forest fire events retrieved 


Precision = 
number of forest fires retrieved in query 


number of relevant forest fire events retrieved 


Relevance = 
number of relevant forest fires classified 
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2.5.2 F-Measure 


A measure which combines precision and recall for a small dataset is F-measure and is the weighted 
harmonic mean of precision and relevance. The traditional F-measure or balanced F-score= 
Precision x Recall 


2 x (Precision + Recall) 
for nonnegative real f is that it is based on van Rijsbergen’s effectiveness E given by 


, is because recall and precision are evenly weighted. The general formula 


1 
1 —a)l 
xz +a a); 


Alarmyank = 1 


We further look into accidental small fires, as they are very probabilistic and any conceptual link 
to the attributes may lead to rank the idea of precision and relevance of the collection. «— the 
alarm weight are calculated based on the reliability of the ground truth, higher precision weightage 
is given to large and medium fires compared to accidental small fires. The precision weight factor 


x 
for small fires can then be evaluated using T 


2.5.3 Performance of Fire Topics Classification Using Temporal Ranking 


BADays% < 50 ha = œ + P In(VF) (2.11) 


where a = 2.895 and B = 1.265, which is the variance of the BA data versus fire activity showing 
logarithmic O(lg(BA) complexity as shown in Figure 2.3a. 

The broad classification of topics in the training samples collection [14] are reflected in two 
categories, accidental small fires and large wild fires. To evaluate the performance of the two 
we use weighted precision versus relevance to estimate the ranking information F@(0.5) for the 
previous equation . The F-scores are calculated and weighted for high reliability by using F@(0.5), 
which is twice the precision compared to its equivalent relevance scale. Reliability and precision are 
proportionally weighted while relevance is inversely proportional. The performance scores show 


=— Events —"—FQ0.5—Large fire 
BA —=— F@0.5—Small fire 
250p 
E 025, 
200 + E 
> 2 0.20 F 
E 3 
5 1501 
3 3 0.151 
v [a] 
= + 
ET = 0.10} 
Z fa) 
50 F l 0.05 F 
= 
3 | 
0 + + pu 0.00 ; 1 
5 10 30 50 100 10,000 Mon Tue Wed Thu Fri Sat Sun 
(a) Burnt area (b) Week days—temporal 


Figure 2.3 Fire activity plot and its F-score transform. (a) Fire activity vs. burnt area yields a 
logarithmic relation and (b) twice the precision vs. relevance. 
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Table 2.3 Performance of Ranking with False Alarm Rates 


Other Test | Yellow Region Blue Region Temporal | FWI 
Alarm H=Hit H=Hit H=311 H=69 
No alarm M= Miss M= Miss M=111 M=3 
False alarm | F=False alarm | Z=Null hypothesis | F=8 F = 307 


that query evaluation for accidental small fires is 1.7 times higher when compared to queries for 
large wild fires or from the same collection. The plot in Figure 2.3b shows queries for accidental 
and small fires for the temporal attribute days of the week, where the accidental small fire F-scores 
have much higher values. 


2.5.4 Ranking Accidental Small Fires 


Accidental small wild fires are possible all through the year, making them a viable application for 
automated sensor measurements. The measurements such as temperature, humidity, and wind 
gust are automated, while temporal attributes such as human traffic and day of the week are used 
to study the small fire events (Table 2.3). The peaks of the plot in Figure 2.4 suggest high alarm 
during weekends followed by only one high alarm day during the normal week. The weightage 
of the ranking suggest that small fires events are caused due to temporal attributes such as human 
traffic and vehicular routes more than any observed correlated sensor measurements. 


2.6 Alarm Ranking Function for Large Fires 


In the previous section we used BA and data from sensor networks to classify forest fires into four 
classes. In this section we will instead use domain knowledge to precisely predict fires by using 


Figure 2.4 Fire alarm days ranked using F@(0.5). 
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Table 2.4 FWI Classes 


FWI Types Measure (Normalized) 
Low (AF) 0-8 

Medium (SF) 8-13 

High (MF) 13-32 

Very high (LF) 32 > FWI < 80 


FWIs (low; moderate; high; very high} as shown in Table 2.4. FWI is calculated using ISI and 
BUI, where ISI represents the Initial Spread Index and BUI represents the Build Up Index, which 
indicate fire behavior and respectively represent rate of fire spread, fuel consumption, and fire 
intensity. All FWI indexes are significantly correlated with the number of fires and the burned 
area, especially when BA > 100 is the area burnt by large fires. The average FWI index variation 
during the year is shown in Figure 2.5a and b. It increases during the month of May and peaks 
in August and September and starts reducing in the month of October. The following equation 
shows a numerical representation of BA when using FWI classes. Which is mean daily burned area 
per month and the mean daily number of fire events per month (NF). 


BApwr > 50 ha = (BUI) + (ISI)* (2.12) 


Where ISI and BUI are calculated from the environment for a given fuel type, x is the estimated 
geometrical fire spread factor. The FWI index is highly correlated with the number of fires and the 
BA. The plot in Figure 2.5 shows the correlated region when FWI > 33 (very high) in the case of 
large fires. 


2.6.1 Performance of Fire Topics Classification Using FWI Ranking 


Large fire occurrences damage more than 50 ha in total amounts for majority of the BA (ha). It is 
a high priority to avoid large fire incidents and help forest conservation. As they are hard to detect 
and have a varying threshold, it is also a cause of false alarms [15] in an automated system. Plotting 
all the correlated FWI components that relate to fire activity, plot from Figure 2.5a shows that 
peak months have a gradual increase of FWI index and are also correlated with large fire incidents. 
The area of high correlation is shown in yellow with lowest false alarm for a given FWI threshold. 
The lower bound conditions for fire activity are shown in blue, which have higher false alarm due 
to valid area above the yellow region. 

The temporal correlation for large fires using the FWI F-score is plotted in Figure 2.5b. It shows 
that large fires are invariant to temporal changes and perform better than small fires for a given 
precision and recall measure. 


2.6.2 Misclassification and Cost of False Alarms 


Decisionmade 
LargeFire  SmallFire 
LargeFire 0 1000 
SmallFire 1 0 


Cost function matrix for misclassification 
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Figure 2.5 Lower bound range of FWI<= 33 (blue) and higher bound range FWI>33 (yellow) 
show all large fires (dotted line) fall into the yellow region. (a) FWI-based classification and 
(b) FWI for large fires shows invariance 


While testing the performance of the algorithm error rates are simulated, which allows to find 
the sensitiveness of the system to false positives. False positives have more significance with higher 
bound values such as large fires, which are very rare and hard to classify. We define a hit in terms 
of precision to avoid false alarms [15] and Table 2.1. From the given cost matrix we show that 
using FWI the alarms are very precise, while using temporal the alarms are more accurate, which 
includes large amounts of false positives. 


2.7 Machine Learning Algorithms 


Probabilistic algorithms when used with density estimation and class classification yields lowest 
error. This allows to provide a baseline analysis of the system attributes being used. 
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2.7.1 Naive Bayes 


One can use Naive Bayes [16] that by design presumes the class densities a priori, which have 
been determined and accurate. The model calculates the class conditional probabilities of the input 
feature vectors. To understand the underlying skewed structure of the dataset, we further create 
thresholds for accidental small fires compared to medium and large fires as shown in Table 2.6. So 
we have the four possible values for the target variable as given in Equation 2.7. 


2.7.2 User Query 


To validate the model let us predict the fire activity outcome of a peak summer month from the 
dataset [17]. August has significant number of reported fires compared to other months. Estimating 
the unknown probabilities (?) using temporal features of fire events given the attribute values for 
the class. 


? = {Month = August; Day = Monday} 
{Temperature = Cool; Humidity = High, Wind = True} 


The estimated class conditional densities for the independent variables temperature, humidity, and 
wind conditions are calculated using temporal attributes month for the dataset shown in Table 2.5. 
The datasets further are explored using two temporal variables, month and the day of the week, 
as shown in Tables 2.7 and 2.8. The temporal variables introduced into the dataset help gain the 


Table 2.5 Posterior Probabilities for Background Weather Data for the Peak Month 
August 


Prior Predictor 


Burnt Area (ha) | August | Monday | Temperature | Humidity | Windy | Probability | Variance (%) 


>1 ha > 0.34 | » 0.14 > 0.46 p 0.17 |» 0.42 > 0.47 > 57 
>1 ha <=10 ha 0.39 0.15 0.35 0.13 0.38 0.33 25.0 
>10 ha <=50 ha | 0.30 0.14 0.43 0.19 0.50 0.13 17.0 
>50 ha 0.33 0.08 0.16 0.08 0.37 0.04 0.02 


Table 2.6 Target Variable 
Occurrences 


Fire Types Recorded 


Accidental (AF) 247 
Small (SF) 175 


Medium (MF) 71 


Large (LF) 24 
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Table 2.7 Likelihood of 
Fires for the Month of August 


Fire Type Month=August 
Accidental 0.004 

Small 0.002 
Medium 0.001 

Large 0.00004 


Table 2.8 Posteriors Probabilities for 
Temporal Feature Day of the Week 


Days | Accidental | Small | Medium | Large 
Mon > 35 » 27 » 10 > 2 
Tue 28 21 11 4 
Wed 22 24 5 3 
Thu 30 21 9 1 
Fri 42 31 12 0 
Sat 42 24 11 7 
Sun 48 27 13 7 
Total » 247 > 175 » 71 p> 24 


insight of users’ dependencies with fire prediction model. 


p(x|lw,)P(w;) 
Y= poll w,)P(w;) 


gi(x) = P(wi||x) = (2.13) 


Substituting the corresponding highlighted values from Tables 2.5 through 2.8 in the Equation 
2.13, we get the posterior probability of accidental small fire. 


A 0.0007547 
fi accidental = TAI ZII = % 2.14 
"accidental = “993565 A a 
fre 0.000333 _ ,., id) 
reSmall = oaza = 
ESmall = 003565 g 
: 0.000223 
fe dime 0.17% 2.16 
is 0.003565 a ete) 
A 0.000000287 
hie = A 2.1 
sg 0.003565 ° a 


From the posterior probabilities for the month of August for the data collected in Portugal [17], 
the likelihood of accidental small fires are very high. From cross-validating from the known fact 
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that in summer the likelihood of wild fires is higher, the Bayes rule is able to classify the dataset 
for accidental and small fires with high accuracy. We use a simulation framework in the following 
sections to further prove our initial conclusion from the datasets. It is shown that the training time 
for Naive Bayes scales linearly in both the number of instances and number of attributes. 


2.7.3 Tree Classifier 


In this section, we focus on the domain rules, which are applicable to the learning system. Tree 
classifiers lend themselves to use ML rules [11] when searching the hypothesis by further branching 
on specific attributes. The design of such a classifier needs to sort the weights or entropies [16] of 
the attributes, which is the basis of its classification effectiveness. 

ID3 is a popular tree-classifier algorithm, to implement 1D3 as illustrated in Figure 2.6a and 
Table 2.9 with our attributes. Let (S) be a collection of samples, then using the tree algorithm that 
uses entropy to split its levels entropy is given by 


Entropy(S) = 20) log, p(z) (2.18) 


i=0 


Let us assume a collection (S) has 517 samples [17] with 248, 246, 11, and 12 of accidental, small, 
medium, large fires, respectively. The total entropy calculated from Equation 2.18 is given by 


Entropy(s) = 248 tog, 248 + 245 tog, 245 + 1L tog, 2! 
517 517 517 517 517 517 
+ ie, ae = 1.23 
517 517 


(b) 
Figure 2.6 Weka algorithm toolkit. (a) Tree classifier and (b) Bayes network. 


Table 2.9 Gain Ratio Calculation for Tree Using Entropy 


Month Temperature Wind 


Not shown Info: 1.08 Info: 1.20 


Not shown | Gain: 1.23 — 1.08 = 0.192 | Gain: 1.23 — 1.08 = 0.025 
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2.7.4 Attribute Selection 


ID3 uses a statistical property called information gain to select the best attribute. The gain measures 
how well the attribute separates training targeted examples, when classifying them into fire events. 
The measure of purity that we will use is called information and is measured in units called bits. 
It represents the expected amount of information that would be needed to specify whether a new 
instance should be classified as accidental, small, medium, or large fires, given that the example 
reached that node. The gain of an attribute is defined by and illustrated in Table 2.9. Using the 
calculated attribute for information gain, we show that temperature attribute is used before the wind 
attribute to split the tree after the tree root. 


i=c 


Sy 
Gain(S, A) = Entropy(S) — Y ¡sy Entropy(S,) (2.19) 
i=0 


9 9 23 23 3 3 1 
E S = l | l | ] | l = 1.282 
ntropy(SHot) = ze 082 36 + 361082 36 + ag 082 36 + 36 982 36 
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POP Cool? = 269 982 269 Y 269 982 269 ! 269 “82 269 ! 269 82269: 
43 139 335 
Entropy(temp) = 517 x 1.282 + 517 x 1175+ 517 x 1.05 = 1.08 
Gain(S, temp) = 1.23 — 1.08 = 0.192 
162. 162 72 72. 8 8 7 
E Saa =] L] wj L] = 1.1952 
ntropy(SHigh) = 779 1082 379 + 249 1082 249 + 249 1982 349 + 249 82 249 dd 
68 68 59 59 2 2 4 
E Sui] E e] L] e] = 1.24 
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361 156 
E ind) = —— x 1.1952 + —— x 1.24 = 1.20 
ntropy (wind) 517 x 517 x 


Gain(S, wind) = 1.23 — 1.20 = 0.025 
The internal tree representation for m attributes from n samples will have a complexity of 


O(lg n). With increasing inputs given by parameter n the height of the tree will not grow linearly 
as in the case of Naive Bayes. On the other hand, complexity of building a tree will be O(n lg n). 


2.8 Simulation 


Open-source workbench called WEKA [4] is a useful tool to quantify and validate results, which 
can be duplicated. WEKA can handle numeric attributes well, so we use the same values for the 
weather data from the UCI [6] repository datasets. The class variable has to be a nominal one to 
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allow WEKA [4]; we convert all fire types to “0” or “1,” where “0” is for accidental small fire and 
“1” is for large fires making it a two-class classifier; the results are shown as confusion matrix in 
Tables 2.11 through 2.13. Naive Bayes correctly classifies accidental and small fires (209 out of 
247), while the J48 tree classifier does far more, 219 out of 247 and SVM with high precision (235 
out of 247). 

As WEKA uses kappa [4] statistics internally for evaluating the training sets, a standard score 
of >60% means training set is correlated. Using J48 simulation we get 53.56% just below and 
when using SVM we get 0.68 above the correlated index. The comparison on results shows that J48 
tree classifier does better than Naive Bayes by 25% and the corresponding SVM does 35% overall 
showing least bias of the three models. Therefore, using sensor network measurements accidental 
and small fires can be predicted with high precision using SVM classifier. 


2.8.1 Simulation Analysis 


WEKA attribute statistics for training set and cross validated testing and its effective correlation 
kappa score. Tables 2.10 through 2.12 show kappa and other comparison statistics for Naive Bayes, 
J48 tree, and Support Vector Machine show classifiers for small fires. The experiment is repeated 
using FWI, which are shown for J48 tree classifier in Table 2.14. 


2.8.2 Error Analysis 


Equation 2.11 specifies the regression model error and its following confusion matrix from the 
simulation scores are shown in Tables 2.13 through 2.16, upper bound of small fire (AF+SF) has 
over 90% precision in SVM, 80% accuracy for J48-Tree, and 61% overall for Naive Bayes. The 
corresponding baseline performances including all fires categories is 82% for SVM, 72.1% for 
J48-Tree, and Naive Bayes is 51.64%, which is due to bias toward small fires and only SVM by 
design is the least biased (Tables 2.17 through 2.19). 

When EWI classification [17] is as given in Equation 2.12 for large fires prediction, it is more 
precise with better precision as shown in the confusion matrix Table 2.20. The percentage of 
correctly classified is >95%, making it reliable with few false alarms. 


Table 2.10 Evaluation on Training Set for Naive Bayes 


WEKA Stats Results | Summary (%) 
Correctly classified instances 267 51.64 
Incorrectly classified instances | 250 48.35 
Kappa statistic 0.1371 

Mean absolute error 0.3022 

Root mean squared error 0.3902 

Relative absolute error 94.86% 

Root relative squared error 97.84% 

Total number of instances 517 
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Table 2.11 Evaluation on Training Set for J48 Tree 


Classifier 
WEKA Stats Results | Summary (%) 
Correctly classified instances 373 72.14 
Incorrectly classified instances | 144 27.85 
Kappa statistic 0.5356 
Mean absolute error 0.1938 
Root mean squared error 0.3113 
Relative absolute error 60.83% 
Root relative squared error 78.04% 
Total number of instances 517 


Table 2.12 Evaluation on Training Set for SVM Linear 


Classifier 
WEKA Stats Results | Summary (%) 
Correctly classified instances 421 81.43 
Incorrectly classified instances | 96 18.56 
Kappa statistic 0.6893 
Mean absolute error 0.0928 
Root mean squared error 0.3047 
Relative absolute error 29.14% 
Root relative squared error 76.40% 
Total number of instances 517 


Table 2.13 Confusion 
Matrix for Naive Bayes 
Using Training Set 


LF | MF | SF | AF 
LF 0 1 7 16 
MF | 0 5 | 12 54 
SF 0 7 | 53 | 115 
AF | 0 0 | 38 | 209 


m 47 
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Table 2.14 Confusion 
Matrix on Training Set for 
J48 Tree Classifier 


LF | MF | SF | AF 


LF 7 0 7 10 


MF | 0 | 29 15 27 


SF 1 7 | 118 49 


AF 0 5 23 | 219 


Table 2.15 Confusion 
Matrix on Testing Set for 
J48 Tree Classifier 


LF | MF | SF | AF 


LF 0 2 8 14 
MF | 2 7 2 40 


SF 3 | 12 | 71 89 


AF | 4 | 15 | 59 | 168 


Table 2.16 Confusion 
Matrix on Training Set for 
SVM Linear Classifier 


LF | MF | SF | AF 


LF 7 0 5 12 


MF | 0 | 31 15 25 
SF 0 1 148 26 
AF 0 0 12 | 235 


Table 2.17 Confusion 
Matrix on Testing Set 
Using SVM 


LF | MF | SF | AF 


LF 0 0 7 17 
MF | 0 1 17 53 


SF 0 3 | 42 | 130 


AF 0 4 | 51 | 192 


Modeling Unreliable Data and Sensors m 49 


Table 2.18 Confusion 
Matrix on Training Set 
Using Bayes Network 


LF | MF | SF | AF 
LF 0 0 5 19 
MF | 0 4 5 62 
SF 0 4 | 30 | 141 
AF 0 0 | 23 | 224 


Table 2.19 Confusion 
Matrix on Testing Set Using 
Bayes Network 


LF | MF | SF | AF 
LF 0 0 5 19 
MF | 0 2 8 61 
SF 0 4 |23 | 148 
AF 0 2 | 28 | 217 


Table 2.20 Confusion Matrix 
on Training Set for FWI > 32 


Very High | High 


Very high 371 0 


High 17 0 


2.9 Correlation of Attributes 


Erom statistical point of view, if the attributes have similar values then it creates high bias creating 
what is called over-fitting error during learning. In our case, temperature and humidly may have 
similar values and need to be avoided and substituted with a suitable attribute. To pre-process 
and analyze, we use all the available sensor measurements in the dataset and WEKA provides the 
attribute selection as illustrated in Table 2.21. 

We use the attribute selection wizard of WEKA to find out the best match. The analysis shows 
that the Month(100%), Day(10%), and Wind(0%) are highly dependent on the precision. In a 
two-class classification, the quantitative data are biased toward small fires and SVM does better 
due to better generalization (Table 2.22). In the qualitative analysis that is based on the frequency 
of attributes, WEKA picks Month, which is a temporal type. The F-score of small fires is higher, 
when using temporal attributes as shown in Figure 2.3b, which is also true for WEKA predictions. 
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Table 2.21 Attribute Selection 
10-Fold Cross-Validation (Stratified) 


Number of Folds (%) | No. | Attribute 
10(100%) 1 Month 
1(10%) 2 Day 
0(0%) 3 Temp 
0(0%) 4 RH 
0(0%) 5 Wind 


Table 2.22 Evaluation FWI > 32 on Training Set for J48 
Tree Classifier 


WEKA Stats Results | Summary (%) 
Correctly classified instances 371 95.61 
Incorrectly classified instances | 17 4.38 
Kappa statistic 463.64 

Mean absolute error 0.0838 

Root mean squared error 0.2047 

Relative absolute error 97.5159% 

Root relative squared error 99.9935% 

Total number of instances 388 


2.10 Better Model for Weather Data 


In the error analysis, we show that the true error performance depends on the learning algorithms 
[1] and how the weights are learnt. As all the data mining algorithms assume samples to be 
i.i.d., it cannot perform well with highly correlated weather data. In the domain of weather 
data where features tend to be highly correlated, careful feature selection and a better model are 
needed to address the shortcomings of earlier models. Bayes network allows to define a better model 
to define classes and events that are dependent and independent of each other. This model allows to 
distinguish the overlapping statics using correlated and similar features having the same range and 
values. The basic Bayes net performance is good while testing with 88% precision as shown in 
Figure 2.7a and b and has similar accuracy as other models proving its discriminative power for the 
underlying data. 


2.11 Future Work 


As machine learning is getting popular, its scalability and study of the family of algorithms need to 
be studied. Development of a large-scale high-speed system is needed to perform k-nearest neighbor 
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Figure 2.7 Plot of F-score for each model and its performance with weather training data. (a) 
ML algorithm performance and (b) Bayes network. 


searches. The availability of such a system is expected to allow more flexible modeling approaches 
and much more rapid model turnaround for exploratory analysis. We like to explore Hadoop echo- 
system to develop machine learning algorithms, data reference set of 20 million profiles to sort the 
best 1000 records in less than 3 h. Some initial work using the Mahoot machine learning framework 
is proposed to port the described algorithms in a distributed framework running Hadoop clusters. 


2.12 Summary 


We use a query log approach of search engines and standard statistical ranking measure to do a base- 
line analysis and use data from inexpensive sensors to validate against probabilistic ML algorithms. 
From Table 2.23, we show that precision- and accuracy-© as denoted, the F-measures matches 


Table 2.23 F-Measure Performance for All Tests Compared with WEKA 


Experiments Model Performance F@0.5 Measure Confusion Matrix 
Overall Small Fire | Large Fire | Small Fire | Large Fire 

Ranking —Temporal Bias 0.220 0.17 — — 

Ranking—FWI Bias 0.1 0.90 — — 

Weka(NB)—Temporal | Generative 0.68 0.1 0.6 0.3 

Weka(Bayes Better model 0.889 — 0.88 0.0 
Net) —Temporal 

Weka(J48)—Temporal | Decision tree 0.78 0.5 0.75 0.3 

Weka SVM— Generalized 0.879 0.7 0.90 0.3 
Temporal 

Weka(J48) — FWI Invariant — 0.950 0.1 0.90 
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the expected WEKA simulation statistics. FWI is able to boost the weak performance of the raw 
data from sensors, which are typically hard to calibrate. In the qualitative analysis of small fires, 
the performance of a generalized classifier such as SVM is preferred. Similarly, the qualitative 
performance for large fires is done by careful attribute selection and we show that an invariant 
attribute selection such as (FWI > 50) yields high classification precision. The use of a better 
model like Bayes network in the case of weather data helps to increase the performance. 
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3.1 Introduction 


The contemporary world presents novel challenges for distributed measurement and control (DMC) 
systems. Today's DMC applications are expected to solve complicated tasks in an autonomous 
mode, which requires a substantial level of intelligence. Thus, effective networking of smart and 
intelligent sensors is currently an active focus of research and development. 

Most of the smart transducers in use today have a structure similar to that shown in Figure 3.1. 
They generally consist of four major components: the actual transducers (sensors and/or actuators), 
a signal conditioning and data conversion system, an application processor, and a network commu- 
nication system (the last two are often considered as a whole—i.e., the so-called network-capable 
application processor [NCAP]) [1]. To call these types of smart transducers intelligent, these trans- 
ducers should support one or several intelligent functions such as self-testing, self-identification, 
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Figure 3.1 Typical structure of a smart transducer. 


self-validation, and self-adaptation [2]. Naturally, implementation of these functions introduces 
special requirements, both for the actual transducers and for the communication interface between 
the processor (or NCAP) and the transducers. 

Besides the actual on-node intelligent function support, networking for an intelligent sensor 
requires a universal communication mechanism between the nodes and a uniform means of data 
representation. Indeed, regardless of the size of the sensor network (SN), each node should be able 
to get in touch with the required sensors, request and receive the required measurement results, 
understand them, and react accordingly (see Figure 3.2). Additional value the problem of SN’s 
nodes interoperability achieves in conjunction with the Internet Protocol (IP) utilization for SN 
within Internet of Things concept [3]. 
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Therefore, the following two major components of the intelligent sensor (transducer) interfacing 
problem should be considered: 


1. The actual physical and electrical interfacing of transducers to processors or NCAPs and 
implementation of intelligent sensor functions for these 
2. NCAP interfacing to other devices within the intelligent SN and data representation 


Both of these aspects are discussed in detail in the current chapter. 


3.2 Transducer Interfacing to Intelligent Transducer's Processor 


As shown in Figure 3.1, a smart transducer is a complex system that is built around an NCAP 
and the attached transducers. Depending on the application, the interface between the NCAP 
and the transducers can be implemented as a peer-to-peer (P2P) or a network connection over 
wired or wireless media [1]. Nonetheless, the intelligent transducer concept introduces some basic 
requirements for this interface; namely, the support of required intelligent transducer features. The 
most important capabilities that are required from an intelligent NCAP-transducer interface are 


[1,4] as follows: 


m The standardized physical and electrical connection between the transducer and the NCAP 

m The support for detection of transducer connection/disconnection and identification of the 
transducer by the NCAP 

m The support for transducer diagnosis by the NCAP 


These features allow connection of the transducers to the NCAP within an intelligent transducer 
in a simple “plug-and-play” (P&P) mode, where the NCAP automatically detects the attached 
transducers, calibrates them, and puts them into use. The transducer P&P support also allows 
a significant simplification of the development and maintenance of the SN application and an 
increase in system interoperability and adaptability. 

We provide a comprehensive picture for the problem of interfacing of a transducer to an NCAP 
in Sections 3.2.1 through 3.2.3. There, we discuss the plain transducer interfaces in widest use 
today, the current smart transducer interface standards, and the implementation strategies for the 
most important intelligent transducer interface features over plain interfaces. 


3.2.1 Existing Plain Transducer Interfaces 


The presence of a large number of transducer manufacturers has catalyzed the advances in DMC 
systems in recent years. However, this has resulted in a lack of general agreement on low-level 
transducer interfaces, which makes the integration of transducers into multivendor networks rather 
a challenging task [1,4]. The transducers currently available on the market use either analog or 
a wide range of different digital and quasi-digital interfaces. Therefore, according to Ref. [2] 
and the International Frequency Sensor Association (IFSA), the proportion of sensors on today’s 
global sensor market is 55%, 30%, and 15% for analog, digital, and quasi-digital output sensors, 
respectively. Of the digital sensors, the most widespread types for general-purpose applications are 
presently the ones utilizing IC, SPI, and 1-wire interfaces [5]. Illustrations of the most widely used 
interfaces are presented in Figure 3.3a through f. 
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Figure 3.3 The most widely used general-purpose interfaces for transducers. (a) Analog inter- 
face. (b) UART interface. (c) Quasi-digital interfaces. (d) 1-wire interface. (e) SPI interface. 
(f) 12C interface. 


The sensors with analog interfaces are currently the most widespread [2]. The main advantages of 
these sensors are their simplicity and low price, although these sensors do require external analog- 
to-digital converters (ADCs) before the measurements are sent to the digital processing device. If 
the microcontroller is used as a digital processing device (microcontrollers are currently the most 
widely used embedded systems [6]), these already have inbuilt ADCs; otherwise, an external ADC 
should be used. Physical connection for sensors with analog outputs to ADC (see Figure 3.3a) usually 
utilizes only a single wire (not considering the sensor power supply lines), although this line can 
require a special shielding to prevent induced noise adding error to the measured value. Typically, 
several analog sensors would not be connected to ADC over the same physical line at a single time. 
Obviously, the sensors with analog interfaces cannot provide any mechanisms for their identification. 

The quasi-digital sensors, as defined in Ref. [7], are “discrete frequency-time domain sensors 
with frequency, period, duty-cycle, time interval, pulse number, or phase-shift output.” Usually, 
these sensors utilize standardized digital signal voltage levels with special modulation for representing 
output data. As Ref. [2] reveals, the currently most widespread quasi-digital sensors use frequency 
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modulation (FM) (70% of available on market), pulse-width modulation (PWM) (16%), and 
duty-cycle (9%) modulation, although some sensors also use pulse number (3%), period (1%), and 
phase-shift (1%) modulations. Depending on output signal modulation (see, e.g., Figure 3.3c), a 
quasi-digital interface will require either one (for FM, PWM, duty cycle, pulse number, and period 
modulation) or two (for phase-shift modulation) physical lines. Similar to analog sensors, the 
quasi-digital sensors usually have a rather simple structure and low cost, but they are less vulnerable 
to noise and interference and are capable of providing higher accuracy [2,8]. Another advantage 
of quasi-digital sensors is that they do not require ADCs, although the measurement reading by 
processing systems requires some additional effort. Like the analog versions, several quasi-digital 
sensors usually cannot be connected to the same physical line. 

The sensors with full-featured digital output convert the measurement results to digital form 
on-chip. Once converted, digital measurement values can be accessed by the processing device using 
an appropriate communication interface. Currently, a wide variety of digital interfaces are available 
for transducers. Among them are the general-purpose ones (e.g., I7C, SPI, and 1-wire) and a wide 
range of application-specific ones (e.g., CAN-bus for automotive, Fieldbus for industry, DALI for 
light control, etc.) [9]. Most of these interfaces utilize serial communication and provide some 
networking communication capabilities for interconnecting several sensors over the same physical 
lines. In these cases, the transducers are usually implemented as “slave” devices and can only reply 
on requests from the “master” device (usually an NCAP). Obviously, digital sensors have more 
complicated structure compared to analog and quasi-digital sensors, which results in their higher 
price. Nonetheless, digital sensors have numerous advantages, including connection simplicity and 
support for some smart features (e.g., sensor self-calibration or inbuilt data processing capabilities) 
that are of particular value. 

We will briefly discuss the communication through 12C, SPI, 1-wire, and UART interfaces, as 
these interfaces are the most widespread general-purpose digital communication interfaces and are 
widely utilized by different sensors [5]. 

The typical connection of sensors to an NCAP over an /nter-Integrated Circuit (1?C) interface 
and an 12C data format are presented in Figure 3.3f. As this figure shows, the 12C interface uses two 
common physical lines for the clock (SCLK) and data (SDA), which are pulled up with resistors 
(Rp) [10]. The IC interface can be used to connect multiple slave devices to one master device; 
therefore, the master device has to initiate the communication by sending the start bit(S) and the 
address (which usually consists of 7 bits, although addresses of 10 bits are also defined in recent PC 
revisions) of the required slave device. Together with the slave device’s address, the master usually 
transmits the 1-bit Read/Write (R/W) for defining the communication direction, which would be 
used until the stop bit (P) closes the current session. The I7C communication protocol implements 
per-byte acknowledgments (A/A). Note that the addresses for 12C devices are not unique and that, 
depending on the physical connection, one slave device can use multiple (usually 4 or 8) different 
I°C addresses. This prevents identification of a single-valued I7C device based only on its address. 
The most commonly used data rates for IC sensors are between 10 and 400 kbit/s, although recent 
I°C revisions also support the rates of 1 and 3.4 Mbit/s. 

The serial peripheral interface (SPI) is a synchronous serial interface that can operate in full 
duplex mode [11]. The typical method for connection of several SPI slave devices to the master is 
presented in Figure 3.3e. As the figure shows, the SPI bus utilizes three common lines for all slave 
devices: clock (SCLK); master output, slave input (MOSI); master input, slave output (MISO); 
and a separate chip select (CS) line for each slave. Therefore, before starting the communication, 
the SPI master device pulls down the CS line of the required slave device to select it. The SPI 
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specification does not define either any maximum data rate (for existing devices it can reach dozens 
MHz) or any particular addressing scheme or acknowledgment mechanism. 

The /-wire interface is intended to provide a low data rate communication and power supply 
over a single physical line [12]. As revealed in Figure 3.3d, the 1-wire network consists of a master 
device and several slave devices that are connected over the single physical line. The 1-wire line 
is pulled up with a resistor and can be used for supplying power to the slave devices [12]. The 
communication over 1-wire bus starts with the reset and synchronization sequence, when the 
master device first pulls the 1-wire line down for the period of time over 480 us and then releases it. 
After that, the connected slave devices have to signal their presence by pulling the 1-wire line down. 
Thereafter, the master device can start sending or receiving data from the slave devices. For this, 
the master device first selects the required slave device by sending its unique 64-bit serial number 
during the read only memory (ROM) command phase and then it starts sending or receiving the 
actual data. If the master device does not know all of the connected slave devices, it can discover 
their serial numbers using special procedures. The transmission of each single bit from master to 
slave, or vice versa, for a 1-wire bus is initialized by the master device by first pulling the line 
down and then releasing it (for transmitting “1” or receiving data from slave) or keeping it low 
(for transmitting “O”). The main advantages of a 1-wire bus are its simplicity and the support for 
device discovery and single-valued identification. The main disadvantage of the 1-wire interface 
is its low data rate, which usually does not exceed 16 kbit/s, although this low data rate allows 
implementation of 1-wire networks with cable lengths up to 300 m [13]. 

The universal asynchronous receiver/transmitter (UART) interface is no longer very widely used 
by sensors, but it is still quite often used by actuators or for inter-processors communication. We 
are not going to discuss the UART communication in detail, as it is rather well known. Figure 3.3b 
shows the basics for a UART interface and data formats. 

This discussion has covered the interfaces most widely used by a general-purpose plain sensor. As 
can be seen, the major portion of the currently existing plain transducer interfaces cannot provide support 
for the features required for intelligent sensor implementation, such as sensor discovery or single-valued 
identification. 


3.2.2 Smart Transducer Interfaces and IEEE 1451 Standard 


As shown in Section 3.2.1, the majority of the existing general transducer interfaces are unable to 
provide support for any intelligent sensor functionality. That fact has been the main driving force 
for development of special smart transducers interfaces since the 1990s. Although several smart 
transducer interface standards have been proposed in recent years, the most prospective one today 
is the IEEE 1451 set of standards, which has been also adopted in 2010 as ISO/IEC/IEEE 21451 
standard and which is discussed in more detail in the current subsection [14,15]. 

The IEEE 1451 family of standards includes seven documents that define the set of com- 
mon communication interfaces for connecting smart transducers to microprocessor-based systems, 
instruments, and networks in a network-independent environment [1]. As Figure 3.4 reveals, 
the smart transducer for IEEE 1451 is divided into two components that are interconnected 
through the transducer-independent interface (TII): the actual NCAP and the transducer interface 
module (TIM), which contains sensors and actuators, signal conditioning, and data conversion. 
Although IEEE 1451 does not specify actual physical and media access control (MAC) layers 
for TIL it provides the interface for different standardized technologies through IEEE 1451.2 
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(wired point-to-point communication; e.g., SPI, UART, USB), IEEE 1451.3 (wired network; e.g., 
1-wire), IEEE 1451.4 (mixed mode—i.e., interfacing transducers with analog output to NCAP), 
IEEE 1451.5 (wireless communication; e.g., ZigBee, 6LoWPAN, Bluetooth, WiFi), IEEE 1451.6 
(CANopen), and IEEE 1451.7 (RFID) [1,14,16]. The discovery, access, and control mechanisms 
for the transducer supported by the IEEE 1451, both for TIMs connected to NCAPs and for smart 
transducers within the network, allows achievement of the highest level of network interoperability 
and implementation of global smart SNs (e.g., see Figure 3.5). 

For implementing required smart transducer features, the IEEE 1451 family of standards 
defines the transducer electronic data sheets (TEDS) for each transducer (or TIM) connectible to 
the NCAP. The TEDS are memory device attached to the actual transducer (or a memory location 
accessible by the NCAP—the so-called virtual TEDS) and that stores transducer identification, 
calibration, correction, and manufacturer-related information (see Figure 3.6a). Depending on 
the communication interface, the structure of the TEDS can differ slightly. The general TEDS 
structure, defined by IEEE 1451.0, specifies that the TEDS should include four required and 
up to six optional components with the specified structure [14,17]. The required parts are (see 
Figure 3.6a): Meta-TEDS (stores all of the information needed to gain access to any transducer 
channel and common information for all transducer channels); transducer channel TEDS (provides 
detailed information about each transducer; e.g., what is measured/controlled and the ranges), user's 
transducer name TEDS (provides a place to store the name by which this transducer will be known 
to the system or end users) and PHY TEDS (stores all information about physical communications 
media between the TIM and the NCAP) [14,18]. The optional TEDS include calibration TEDS, 
frequency response TEDS, Transfer Function TEDS, text-based TEDS, end user application 
specific TEDS, and manufacturer defined TEDS. Therefore, the minimum size of the TEDS for, 
e.g., an IEEE 1451.2 transducer is around 300 bytes, whilst the size of a full-featured TEDS can 
reach many kilobytes [19]. This is why the simplified—so-called basic—TEDS structure has been 
provided in IEEE 1451.4 (see Figure 3.6b); this basic structure consists of only 8 bytes that contain 
the identification information [20,21]. 

Besides the actual TEDS data, the IEEE 1451 TIM has to contain at least the minimum set of 
hardware and software to respond to NCAP requests and initiate certain service request messages. 
In its most basic form, the TIM should [17] 
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Respond to TIM discovery queries 

Respond to transducer access requests 

Respond to and sometimes initiate transducer management tasks (e.g., sending alerts) 
Respond to and support TEDS management functions 


In addition, a TIM may implement various communication, data conversion, signal processing, 
and calibration algorithms and mechanisms [17]. 

Although the NCAP network interfaces are not within the scope of the IEEE 1451 standard, 
some special cases are included in the standard. Therefore, the IEEE 1451.0 describes the com- 
munication process between a remote network-based client and an IEEE 1451 NCAP server using 
hyper text transfer protocol (HTTP). This feature allows implementation of direct communication 
between NCAPs or allows a single remote web-based client to obtain transducer data from various 
types of IEEE 1451 transducers connected, e.g., to the World Wide Web [17]. 

Although IEEE 1451 has numerous advantages, especially its ability to function in a universal 
way for interfacing any type of transducers (e.g., digital, quasi-digital, and analog ones—see 
Figure 3.5) to the NCAP (although, most often, this will require addition of an external TIM 
board with a TEDS block and an IEEE 1451.X communication controller to each transducer) 
with the support of wide set of intelligent transducer functions, there are some trade-offs for it. 
The main factors that limit the wide dissemination of the IEEE 1451 standard are its complexity 
and requirement for additional component usage for its implementation (e.g., memory and a 
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Globally unique identifier 
a! Points byte TEDS checksumm 
UUID 10 byt tmt A 
ytes Hilimit 1 byte ER y Version letter 
Timing-related information Data converter-related Manufacturer ID |5 bits [Range: 
Timing-related info | 12 bytes information PHY TEDS A-Z (Chr format) 
TEDS length 
Number of implemented Timing-related information Length 4 bytes Version number 
Transducer channels 
Maxchan 2 bytes TEDS ID header Model number |6bits |Range: 
0-63 
TEDSID 4 bytes 
Physical connection Serial number 
specific data 
5 Model number |24 bits |Range: 
TEDS checksumm TEDS checksumm TEDS checksumm A arenas 
Checksumm 2 bytes Checksumm 2 bytes 
K Checksumm 2 bytes 
(a (b) 
Figure 3.6 IEEE 1451 TEDS formats. (a) IEEE 1451.0 required TEDS. (b) IEEE 1451.4 basic TEDS. 


controller for implementing the required TIM), which significantly increases the price of IEEE 
1451 compatible solutions [22]. In addition, the absence of IEEE 1451-based SN components, 
such as transducers, TIMs, and NCAPs on the market is also a negative factor [5]. 


3.2.3 Implementation of Smart Interface Features over 
Plain Interfaces in SNs 


We have, thus far, discussed the most widespread interfaces for plain sensor and existing smart trans- 
ducer interfaces. Section 3.2.1 showed that the majority of existing interfaces for plain transducers 
do not support any intelligent transducer features. Although special smart transducer interfaces 
exist (e.g., the IEEE 1451 discussed in Section 3.2.2), due to their complexity and high costs, these 
still have very limited application scope. This makes the problem of intelligent interface features 
implementation for the sensors with general plain interfaces very real. In the current section, we 
will show how this can be resolved. 

As revealed in Section 3.2.1, the transducers with analog and quasi-digital interfaces have no 
means either to inform the smart transducer’s processor (or NCAP) about their connection or to 
provide any sort of transducer identification information. For these systems, the only way to detect 
a sensor connection is to force the processor to perform regular monitoring of the sensor lines. In 
this case, the sensor connection can be discovered through the analysis of the signal on these lines. 
Nevertheless, identification of sensors by a standalone processor is usually impossible due to the 
absence of any identification information in the signal arising from these types of sensors. 

The only possible option that can provide some capabilities for identifying the analog or 
quasi-digital sensors is to use the measurement data from already known sensors on the same or 


64 m Intelligent Sensor Networks 


Pressure \ í 
i 
I 


Unknown 


sensor data AGE. < N UO -------- A 
Temperature 


sensor data 


Sensor NCAP 
Audio sensor node E 


(9) 


1 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 


Sensor NCAP 


pa 


1 
] 
] 
I 
' 
] 
| 
| 
| 
] 
| 
l 
j 
] 
1 
z 


Figure 3.7 Analog sensor identification based on data correlation with neighboring nodes. 


neighboring network nodes and to calculate the correlation between their data and the unknown 
sensor's data (of course, this method assumes the some correlation exists between the measurements 
from the different sensors or sensor nodes)—see Figure 3.7. 

Unlike analog and quasi-digital sensors, full-featured digital sensors do not start sending mea- 
sured data to the processor immediately upon connection but they wait for a request from the 
processor. This makes the connection/disconnection detection for digital sensor rather compli- 
cated, especially if several sensors can be connected to processor over the same physical lines [5]. 
Therefore, the most effective solution for the detection of digital sensor connection would be the use 
of some external signal generated each time a new sensor is connected. This can be implemented, 
e.g., through the use of specially designed connectors [5]. 

In addition to the mechanisms already available for identifying the digital sensors (e.g., address 
schemes available for I7C or 1-wire interfaces), the data available from their memory can be used. 
Sensor identification can be implemented for the widest range of digital interfaces, including the 
ones that do not have any standardized identification mechanisms, by mechanisms suggested in 
Ref. [5] based on the following four methods: 


m Read from any sensor registers or any other command execution that will return data known 
in advance (sensor identification is based on the facts of correctness of the physical connection 
settings, the existence of registers/commands, and the correctness of the retrieved data) 

m Write and sequentially read from certain registers with inaccessible bits—i.e., the bits with 
values that could not be changed (sensor identification is based on the facts of correctness of 
physical connection settings, the existence of a register, and the position of unchangeable bits) 

m Execute a command for which the range of possible return values is known (e.g., a tempera- 
ture measurement, where sensor identification is based on the facts of the correctness of the 
physical connection settings, acknowledgment of a command execution, and returned data 
falling within known limits) 

m Write and sequentially read from certain registers or a certain command execution (sensor 
identification is based only on the facts of correctness of the physical connection settings 
and acknowledgment of register existence/command execution). 


Therefore, this type of sensor identification mechanism can be implemented using the simple tryout 
algorithm presented in Figure 3.8a. The database, which contains unique request response data 
for all possible sensors, can be stored either on each node or in a special location within the SN 
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Figure 3.8 Plain digital sensor plug-and-play mechanisms. (a) Plain digital sensor identification 
algorithm (single device tryout). (b) Network structure with support for plain digital sensor 
plug-and-play. 


(e.g., the resource center [RC] node—see Figure 3.8b). In addition, the RC nodes can be used 
to provide “sensor drivers”—the pieces of software for sensor node processors that implement 
the required sensor functionality. This will allow the sensor nodes initially to be free of any 
sensor-dependent software and will allow downloading of the required sensor drivers from network 
once the node is attached to it and has identified its available sensors. This mechanism can be 
used to implement a complete plug-and-play mechanism for sensors with plain digital interfaces. 
As has been shown in Ref. [5], the suggested identification method can be used for any type 
of commercially available sensors that utilize plain digital interfaces (i.e., SPI, PC, 1-wire, or 
proprietary ones) without utilization of any additional components. Nevertheless, since the suggested 
mechanism is not based on TEDS or any similar mechanism, proving that the suggested mechanism 
would allow a single-valued identification for all the existing transducers is not possible. 

The networking of the nearby sensor nodes also provides some means for implementing sensor 
diagnosis or malfunction detection. Similar to the suggested correlation-based analog sensor iden- 
tification mechanism (see Figure 3.7), the correlation of neighboring node measurements can be 
used to detect a sensor malfunction. Although this method is not as reliable as the on-node sensor 
diagnosis of smart sensors, it can partially compensate for the absence of appropriate features for 
plain sensors. 


3.3 Intelligent Networking for Sensor Node 


The contemporary sensor networks (SNs) can include numerous heterogeneous nodes manufac- 
tured by various vendors [16]. Nonetheless, for proper network operation, each of these nodes 
should be able to communicate with any other sensor node and with different external devices 
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through some common communication protocol. The success of the transmission control pro- 
tocol/internet protocol (TCP/IP) over recent years made the TCP/IP protocol stack the de facto 
standard for large-scale networking [23]. Recent developments have also allowed adoption of the 
TCP/IP communication for the extreme communication conditions of SN [23,24]. In the current 
section, we will not focus the hardware implementation of TCP/IP over each specific wired or 
wireless communication technology that can be utilized by an intelligent SN; instead, we will go 
one step further and discuss the features of using IP communication within SNs. 

One of the main differences in implementing the TCP/IP for SNs, compared to its use in 
general computer networks, is the very limited resources that are available for SN nodes [16,23]. 
Indeed, many SN nodes have a very restricted amount of memory that limits the possibility for 
these nodes to store data on-node. In addition, due to the limited energy and processing capabilities 
of NCAPs on SN nodes, relaying data processing and analysis on a server device is sometimes more 
convenient. These considerations, together with the limited amount of energy available on an SN 
node for transferring data, demonstrate the importance of choosing a format for the message within 
the SN that will provide sufficiently high compatibility and low energy consumption. 

Besides the universal communication and message format, the intelligent SN concept requires 
the SN nodes (i.e., the NCAPs) to be discoverable and identifiable [25]. 

The rest of the current subsection is organized as follows: Section 3.3.1 discusses the basic 
structure of web service stack used for IP-based SN, Section 3.3.2 discusses ways to represent 
the data in an IP-based SN, and Section 3.3.3 discusses possible ways to implement intelligent 
functionality in an IP-based SN. 


3.3.1 Basic Structure of a Web Service Stack for an IP-Based SN 


The structure of a typical web service stack for an SN application can be described as follows. 
Initially, the web services (î.e., the software system that supports interoperable machine-to-machine 
interaction over a network) are used to set up the connection between client and server end-points 
in a platform-independent manner. In SNs, the NCAPs and end-user terminals usually act as client 
end-points that are connected to a server end-point that runs the database and that contains the 
measurement data or the data about the sensors (see, e.g., Figure 3.9). Most often in SN, the web 
services are represented by a TCP/IP protocol stack, on top of which is implemented the HTTP 
[23,26]. 

After the initial connection has been established (e.g., consider the case when TCP/IP and 
HTTP are implemented), the client can send the required request messages to the server. Although 
this can be done using any HTTP method, in SN GET, POST, PUT, or DELETE are typically 
used [26]. Finally, the client, using a standardized method, specifies the data that should be placed 
on the server or requested from server [27] (see Figure 3.10 for an example). 

In response, the server informs the client if the request was successful and provides the requested 
data, if any data have been requested (see Figures 3.9 and 3.10). HTTP is able to carry the data 
(the “body” of the message) in either direction using any multipurpose internet mail extensions 
(MIME) type and encoding. The body of the HTTP web service message is typically encoded using 
extensible markup language (XML) and the format is known both to the client and the server [28]. 
In order to perform sequences of remote procedure calls (RPCs), the simple object access protocol 
(SOAP) may be encapsulated into XML body data, although this would inevitably complicate the 
client and server software [29]. 

After the sequence of requests is complete, the TCP connection is usually closed (see Figure 3.9). 
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Figure 3.9 Typical client-server interaction and message exchange with SN. (a) Architecture. 
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Client request: 
POST/WSNService HTTP/1.1 
Host: www.centria.fi 


Content-Type: application/json; 
charset = utf-8 


Content-Length: 16 


('temperature—value”:8) 


Server response: 
HTTP/1.1 200 OK 
Date: Mon, 5 Dec 2011 21:05:16 GMT 


Server: Apache/1.3.3.7 (Unix) (Red— 
Hat/Linux) 


Accept-Ranges: bytes 
Connection: close 


Content-Type: text/html; 
charset = UTF-8 


(a) 


Client request: 
GET/index.html HTTP/1.1 


Host: www.centria.fi?temperature 


Server response: 
HTTP/1.1 200 OK 
Date: Mon, 5 Dec 2011 22:38:34 GMT 


Server: Apache/1.3.3.7 (Unix) 
(Red-Hat/Linux) 


Accept-Ranges: bytes 
Connection: close 


Content-Type: text/html; 
charset = UTF-8 


("temperature—value”:8) 


(b) 


Figure 3.10 Examples of HTML GET and POST requests and responses within an SN. (a) POST 
request and response. (b) GET request and response. 
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One of the advantages of utilizing the HTTP for transferring the data in an SN is that it simplifies 
the required connection tune-up and especially the set-up of communication through a firewall 
(see, e.g., Figure 3.11: the TCP/IP-based connection [Figure 3.1 la] will most probably require 
manual firewall rule modification for the server, while the HTTP-based connection [Figure 3.11b] 
most probably will require no special actions) [30]. The use of HTTP also provides effective 
authentication, compression, and encryption mechanisms [26]. 

Today's web services can be implemented either using general web-servers (see e.g., Figure 3.11) 
or cloud platforms (see Figure 3.12). The implementation of web services using cloud platforms (see 
Figure 3.12) provides high scalability and cost-efficiency for the application, as it allows dynamic 
reassignment of the data traffic and required processing between the different resources composing 
the cloud [31]. This is especially valuable when handling of simultaneous data requests from 
multiple user clients is required. 

Nonetheless, the implementation of HTTP for SN introduces the following challenges: 


m High data overhead is caused by the plain text encoding and the verbosity of the HTTP 
header format. 

m TCP Binding (i.e., the association of a node’s input or output channels and file descrip- 
tors with a transport protocol, a port number, and an IP address) seriously reduces the 
performance, especially for ad hoc wireless SN with short-lived connections. 

m The support for a large number of optional HTTP headers that can have very complex 
structure and include unnecessary information increases the implementation complexity 


and SN data traffic overhead. 


3.3.2 Methods for Data Representation in IP-Based SNs 


Minimization of the total message size and maximization of the packet payload conflicts with the 
Internet standard but are essential for reducing the energy consumption of SN nodes [32]. Indeed, 
the standard HTTP headers can be quite voluminous, which usually makes the ratio of payload to 
whole transferred data rather low [32]. This matters in particular for battery-operated SN nodes 
that utilize wireless communication, where every used energy Joule and every additional transmitted 
data byte is meaningful and decreases the node’s operation time. As discussed in Section 3.3.1, in 
SNs, XML is widely used for data encoding [23]. 

The extensible markup language (XML) [28] is a set of rules for encoding documents in 
machine-readable form that is widely used by web services for data presentation and exchange. 
Although the use of XML results in a high level of scalability and interoperability, its use has some 
drawbacks. The most important of these is that XML opening and closing tags can significantly 
increase the size of the data (see Figure 3.13). This problem can be partially solved by using the 
XML compression mechanisms, such as Binary XML. 

Binary XML is a set of specifications that defines the compact representation of XML in a binary 
format. At present, three binary XML compression formats have been introduced, although none 
of them has yet been widely adopted or accepted as a de facto standard [33]. 

The first format was suggested by the International Organization for Standardization (ISO) and 
International Telecommunications Union (ITV) and is called Fast Infoset. This format is proposed 
as an alternative to the World Wide Web Consortium (W3C) XML syntax that specifies the binary 
encoding for W3C XML Information Set. The Fast Infoset has been defined as ITU-T (X.891) 
and ISO (IEC 24824-1) standards [34]. Unlike the other XML representations, the Fast Infoset 
standard can be beneficial both for low bandwidth systems that require high data compression 
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Figure 3.11 Web service layout using a TCP/socket and HTTP in SN. (a) TCP/socket. (b) HTTP. 
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Figure 3.12 Example of web service for an SN using cloud platforms. 


<?xml version = 1.0” encoding = “utf-8”?> 
<root> 

<data—value>8</data—value> 
</root> 


Figure 3.13 Example of an XML file. 


(e.g., the connection between an NCAP and a server in SN) and for high performance systems that 
utilize web services (e.g., end-user client devices in SN) [34]. 

The second format has been suggested by W3C and is called efficient XML interchange (EXI) 
format. This is claimed to be a very compact XML representation that can simultaneously optimize 
performance and utilization of computational resources [35]. The EXI format uses a hybrid 
approach, drawn from the information and formal language theories, in conjunction with the 
practical techniques for entropy encoding XML information. Using a relatively simple algorithm, 
which is amenable to fast and compact implementation, and a small set of data type representations, 
EXT reliably produces efficient encodings of XML event streams. The grammar production system 
and format definition of EXI are presented, e.g., in Ref. [35]. 

The third format is called binary extensible markup language (BXML) and has been suggested 
by Open Geospatial Consortium (OGC) especially for geosensor networks. This format can be also 
used for compressing sensor data and has been defined using OGC’s geography markup language 
(GML) geographical features [36,37]. An example of binary XML compression using BXML is 
presented in Figure 3.14. 

The spatial data in this example are very simple and contain only polylines that are stored 
in an XML file (the data were taken from a Finnish spatial database, Digiroad). This XML 
file contains a coordinate system, optional data, and collection of coordinates for every object. 
This file is next converted to binary format based on OGC's BXML (see Figure 3.14). During 
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a for OCS BRM: 
<gml:featureMember fid ="1921355> 10 
<drd:coordSystem>KKJ3</drd:coordSystem> 9 
<drd:NAME>Savelantie</drd: NAME> 17 
<drd:ATTRIBUTE>4</drd:ATTRIBUTE> 8 
<gml:coordinates 2 
decimal = “.” 5 
cs= 5 
ts=" 5 
value ="3"> 8 
3378741 7111164, 16 
3378738 7111166, 16 
3378712 7111177 13 
</gml: coordinates> 1 
</gml: feature Member> 1 
Overall size: 269 116 


Figure 3.14 Example of binary XML compression using BXML. 


this conversion, the traditional XML tags are encoded with bytes; e.g., a GML element with an 
attribute <gml:featureMember fid = “1921355”> is encoded using a hexadecimal presentation as 
0x03 0x01 0x05 0x04 0xF4 0x00 0x1D 0x51 0x4B 0x06, where 0x03 is an element with attributes, 
0x01 is a pointer to the element name in the global string table, 0x05 is an attribute start code, 
0x04 is a pointer to the attribute’s name in the global string table, OxF4 is a data type (Int32), 0x00 
0x1D 0x51 0x4B is the identification number 1921355, and 0x06 marks the end of attributes and 
the whole element. 


3.3.3 Intelligent Functionality for IP-Based SN Nodes 


An SN node (i.e., the NCAP of SN node) is made discoverable and identifiable within the SN 
using either the specifications that have been developed for web services (e.g., SOAP) or the special 
interface specifications that have been developed especially for sensors (e.g., OpenGIS). 

The SOAP [38] relies on XML for its message format and specifies how to exchange structured 
data for web service implementation. SOAP can be used over multiple connection protocols, 
including HTTP. The use of SOAP provides the SN node with the possibility for self-identification 
support, although, as shown in Figure 3.15, the messages that are used by SOAP can be rather 
voluminous. 

The alternative option is the use of the OpenGIS interface standard that has been developed 
by OGC. According to OGC [37], the OpenGIS interface standard defines OpenLS core services, 
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POST /WSNService HT TP/1.1 

Host: www.centria.fi 

Content-Type: application/soap + xml; charset = utf-8 
Content-Length: 299 

SOAPAction: “http://www.w3.org/2003/05/soap—envelope” 


<?xml version ="1.0"?> 
<soap:Envelope xmlns:soap =" http://www.w3.org/2003/05/soap—envelope’> 
<soap:Header> 
</soap:Header> 
<soap:Body> 
<m:InsertSensorValue xmlns:m ="http://www.centria.fi/sensor01’> 
<m:value>8</m:value> 
</m:InsertSensorValue> 
</soap:Body> 
</soap:Envelope> 


Figure 3.15 Example of a SOAP POST message. 


which forms the services framework for the GeoMobility server (GMS). The core services include 
directory service; gateway service; location utility service; presentation service, and route service. 

GMS, as a location service platform, hosts these services as well as Location Content Databases 
(e.g., web map server [WMS] and web feature server [WFS]) that are accessed through OGC 
Interfaces. For geosensor networks, the main part of OGC’s work is conducted under sensor web 
enablement (SWE), which is a suite of specifications related to sensors, sensor data models, and 
sensor web services. The goal of SWE services is to enable sensors to be accessible and controllable 
via the Internet [39]. 

SWE includes the observations and measurements schema (O&M), sensor model language (Sen- 
sorML), transducer markup language (TML), sensor observations service (SOS), sensor planning 
service (SPS), sensor alert service (SAS), and web notification service (WNS). XML is a key part of 
this infrastructure and all of the services and content models are specified using XML schemas [40]. 

SensorML provides a rich collection of metadata that can be mined and used for discovery 
of sensor systems and observation process features. These metadata include identifiers, classifiers, 
constraints (time, legal, and security), capabilities, characteristics, contacts, and references, in 
addition to inputs, outputs, parameters, and system location. The calibration information, sensor 
type, and sensor operator could also be included (see Figure 3.16). If required, in addition to 
SensorML, metadata that provide a functional description of a sensor node can also use TML to 
provide additional data about the hardware and the information necessary for understanding the 
data gathering process [41]. 

The use of SWE and SensorML/TML, together with the backend system that keeps the database 
containing the data on all known NCAP features (so-called sensor instance registry [SIR]), provides 
the complete solution for implementing NCAP discovery and identification within SN [42]. In 
this case, once connected to an SN, the NCAP is required to provide the SensorML/TML data 
(the metadata transferring mechanism is provided by SOS) to the SIR that reveals the list of sensors 
connected to this NCAP and all other data about the NCAP (e.g., its location, if known). Once 
these data are included in the SIR, any user will be able to see this NCAP and its sensors. Using the 
methods specified in O&M, the user can establish a direct connection and request the measurement 
data directly from the NCAP. Knowing the NCAP address, the user can also use SAS to subscribe 
for specific event notifications that will be sent directly to the user by the NCAP. 
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<sml:SensorML xmlns:sml ="http://www.opengis.net/sensorML/1.0” 
xmlns:swe ="http://www.opengis.net/swe/1.0” 

xmins:gml ="http://www.opengis.net/gml” 

xmins:xlink ="http://www.w3.org/1999/xlink” 

xsi:schemaLocation ="http://www.opengis.net/sensorML/1.0 
http://schemas.opengis.net/sensorML/1.0.0/sensorML.xsd” version ="1.0'> 
<member xlink:role ="urn:ogc:def:role:OGC:centriasensor'> 
<Component gml:id ="Centria”> 

<swe:value>8</swe:value> 

</Component> 

</member> 

</SensorML> 

xmlns:swe ="http://www.opengis.net/swe/1.0” 

xmlns:gml ="http://www.opengis.net/gml” 

xmlns:xlink ="http://www.w3.org/1999/xlink” 

xsi:schemaLocation ="http://www.opengis.net/sensorML/1.0 
http://schemas.opengis.net/sensorML/1.0.0/sensorML.xsd" version ="1.0 
<member xlink:role ="urn:ogc:def:role:OGC:centriasensor'> 
<Component gml:id ="Centria”> 

<swe:value>8</swe:value> 

</Component> 

</member> 

</SensorML> 


Figure 3.16 An example of SensorML data. 


In recent years, the OGC's SWE specifications have been effectively adopted by the National 
Aeronautics and Space Administration (NASA) in their earth observing (EO-1) geosensor networks 
system [43]. As reported in Refs. [44,45], SWE has also been effectively implemented over the 
IEEE 1451 NCAPs, which has resulted in the highest level of interoperability and intelligent 
functionality supported both for sensors on sensor nodes and for sensor nodes inside a network. 


3.4 Conclusions 


In the current chapter, we have focused on the problem of intelligent sensor network node 
interfacing from two sides. 

In the first place, we have discussed the interface between actual hardware sensors and the 
processing unit of an intelligent sensor network node. We have also discussed in detail the most 
widespread plain sensor interfaces (e.g., analog, quasi-digital, and digital ones) and smart sensor 
interfaces (e.g., IEEE 1451). The plain sensor interfaces most widely used today do not support 
any of the features that are required for implementing intelligent sensor functionality. The special 
smart sensor interfaces, although providing the required functionality, have a rather complicated 
structure that results in increased price and power consumption for this solution. This is especially 
undesirable for wireless SNs, which often can have very limited resources. Therefore, we have 
suggested several mechanisms that can be used to implement some intelligent sensor functionalities 
over plain interfaces using only the processing capabilities of the sensor network nodes and the 
existing resources of the sensor networks. 

In the second place, we have considered the interfacing of an intelligent sensor network node 
within a network. We have discussed the structure of a web service stack and features for using 
internet protocol (IP)-based communication within sensor networks. We have also discussed the 
problems of data representation and compression in these types of networks. Finally, we have 
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shown how the required network-level intelligent functionality can be implemented for IP-based 
sensor networks. 

The chapter shows that much has already been done to enable intelligent sensor networks, 
but much still remains to do. One of the major factors that presently limit the dissemination of 
intelligent sensor networks is the high complexity and cost of intelligent sensor network nodes 
and their components. Although these costs can be reasonable for some life-critical applications 
(e.g., volcano eruption, earthquake, or tsunami alarm systems), for the majority of everyday life 
applications, the costs are still too high. This makes the development of simpler solutions for 
intelligent functionality implementations in sensor networks one of the most important tasks of 
present-day research. 


References 


1. E. Song and K. Lee, STWS: A unified web service for IEEE 1451 smart transducers, IEEE Trans. 
Instrum. Meas., 57, 1749-1756, August 2008. 

2. S. Yurish, Smart sensor systems integration: New challenges, IFSA, [Online]. Available: http://www. 
iaria.org/conferences2011/filesICN11/KeynoteSergeyYurish.pdf, January 2011. 

3. The internet of things executive summary, ITU, Geneva, Switzerland, ITU Internet Reports 
2005 [Online]. Available: http://www.itu.int/dms_pub/itu-s/opb/pol/S-POL-IR.IT-2005-SUM- 
PDF-E.pdf 

4. K. Lee and R. Schneeman, Distributed measurement and control based on the IEEE 1451 smart 
transducer interface standards, ZEEE Trans. Instrum. Meas., 49, 621-627, June 2000. 

5. K. Mikhaylov, T. Pitkaaho, and J. Tervonen, Plug-and-play mechanism for plain transducers with 
digital interfaces attached to wireless sensor network nodes, submitted for publication. 

6. K. Mikhaylov, J. Tervonen, and D. Fadeev, Development of energy efficiency aware applications using 
commercial low power embedded systems, in embedded systems - Theory and Design Methodology, 
K. Tanaka, Ed., Rijeka, Croatia: InTech, 2012, pp. 407-430. 

7. G. Meijer, Smart Sensor Systems. Hoboken, NJ: John Wiley & Sons, Inc., 2008. 

8. S. Yurish, Extension of IEEE 1451 Standard to Quasi-Digital Sensors, Proc. SAS'07, San Diego, CA, 
USA, pp. 1-6, February 2007. 

9. N. Kirianaki, S. Yurish, N. Shpak, and V. Deynega, Data Acquisition and Signal Processing for Smart 
Sensors. Chichester, U.K.: John Wiley & Sons, 2001. 

10. NXP Semiconductors, 12C - bus specification and user manual (UM10204 rev. 4), Eindhoven, 
Netherlands, 2012. Retrieved from http://www.nxp.com/documents/user_manual/UM10204.pdf 

11. Motorola Semiconductor Products Inc., SPI Block Guide (V03.06), Schaumburg, IL, 2003. Retrieved 
from http://www.ee.nmt.edu/—teare/ee308l/datasheets/S12SPIV3.pdf 

12. Maxim Integrated Products, 1-Wire Communication Through Software (AN 126), San Jose, CA, 
2002. Retrieved from http://pdfserv.maxim-ic.com/en/an/AN 126.pdf 

13. Maxim Integrated Products, Book of iButton Standards (AN 937), San Jose, CA, 1997. Retrieved 
from http://pdfserv.maxim-ic.com/en/an/AN937.pdf 

14. IEEE Instrumentation and Measurement Society, IEEE Standard for a Smart Transducer Interface 
or Sensors and Actuators Common Functions, Communication Protocols, and Transducer Elec- 
tronic Data Sheet (TEDS) Formats (IEEE Std. 1451.0-2007), New York, NY, 2007. Retrieved from 
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4346346 

15. IEEE Instrumentation and Measurement Society, ISO/IEC/IEEE Information technology — Smart 
transducer interface for sensors and actuators — Common functions, communication protocols, and 
Transducer Electronic Data Sheet (TEDS) formats (ISO/IEC/IEEE Std. 21450-2010), New York, 
NY, 2010. Retrieved from http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5668466 


16 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34, 


Intelligent Sensor Interfaces and Data Format m 75 


M. Kuorilehto, M. Kohvakka, J. Suhonen, P. Hamalainen, M. Hannikainen, and T. Hamalainen, 
Ultra-Low Energy Wireless Sensor Networks in Practice: Theory, Realization and Deployment. Hoboken, 
NJ: John Wiley & Sons, Inc., 2007. 

J. Wiczer and K. Lee, A unifying standard for interfacing transducers to networks—IEEE-1451.0, Proc. 
Sensors ISA Expo 05, Chicago, IL, USA, pp. 1-10, 2005. 

S. Manda and D. Gurkan, IEEE 1451.0 compatible TEDS creation using. NET framework, Proc. 
SAS'09, New Orleans, LA, USA, pp. 281-286, February 2009. 

J. Wiczer, A summary the IEEE-1451 family of transducer interface standards, Proc. Sensors Expo'02, 
San Jose, CA, USA, pp. 1-9, May 2002. 

D. Wobschall, IEEE 1451—A universal transducer protocol standard, Proc. Autotestcon’07, 
Baltimore, MD, USA, pp. 359-363, September 2007. 

IEEE Instrumentation and Measurement Society, ISO/IEC/IEEE Standard for Information tech- 
nology — Smart transducer interface for sensors and actuators — Part 4: Mixed-mode commu- 
nication protocols and Transducer Electronic Data Sheet (TEDS) formats (ISO/IEC/IEEE Std. 
21451-4-2010), New York, NY, 2010. Retrieved from http://ieeexplore.ieee.org/stamp/stamp.jsp? 
tp=&arnumber=5668460 

K. Lee, M. Kim, S. Lee, and H. Lee, IEEE-1451-based smart module for in-vehicle networking systems 
of intelligent vehicles, EEE Trans. Ind. Electron., 51, 1150-1158, December 2004. 

A. Dunkels, Towards TCP/IP for wireless sensor networks, Licentiate thesis, Malardalen University, 
Eskilstuna, Sweden, 2005. 

J. Higuera and J. Polo, IEEE 1451 standard in 6LoWPAN sensor networks using a compact 
physical-layer transducer electronic datasheet, IEEE Trans. Instrum. Meas., 60(8), 2751-2758, August 
2011. 

X. Chu and R. Buyya, Service oriented sensor web, in N.P. Mahalik (Ed.), Sensor Network and 
Configuration: Fundamentals, Techniques, Platforms, and Experiments. Berlin, Germany: Springer- 
Verlag, 2006. 

R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, Hyper- 
text Transfer Protocol - HTTP/1.1 (RFC 2616), 1999. Retrieved from http://256.com/gray/docs/ 
rfc2616/03.html 

D. Crockford, RFC4627-The application/json media type for javascript object notation (json), The 
Internet Society, 2006. 

T. Bray, J. Paoli, C. Sperberg-McQueen, E. Maler, and F. Yergeau, Extensible markup language 
(XML) 1.0 (5th edn.), World Wide Web Consortium Std. 2008. 

M. Olson and U. Ogbuji, The Python web services developer: Messaging technologies compared, IBM 
developer works, [Online]. Available: http://www-128.ibm.com/developerworks/library/ws-pyth9/, 
July 2002. 

M. Strebe and C. Perkins, Firewalls 24seven. Alameda, CA: Sybex Inc., 2002. 

D. Chappell, Introducing the windows azure platform, Microsoft Corporation, [Online]. Available: 
http://msdn.microsoft.com/en-us/library/ff803364.aspx, October 2010. 

A. Castellani, M. Ashraf, Z. Shelby, M. Luimula, J. Yli-Hemminki, and N. Bui, Binary WS: Enabling 
the embedded web, Proc. Future Network and Mobile Summit’10, Florence, Italy, pp. 1-8, June 
2010. 

J. Chang, i-Technology viewpoint: The performance woe of binary XML, [Online]. Available: 
http://soa.sys-con.com/node/250512, August 2008. 

Telecommunication Standardization Sector of International Telecommunication Union, Series X: 
Data Networks, Open System Communications And Security: Information technology — Generic 
applications of ASN.1: Fast infoset (ITU-T Recommendation X.891), Geneva, Switzerland, 2005. 
Retrieved from http://www. itu.int/rec/dologin_pub.asp?lang=e&cid=T-REC-X.89 1-200505-I!!PDF- 
E&type=items 


76 


35 


36. 


37. 


38. 


39. 


40. 


41. 


42. 
43. 


44, 


45. 


m Intelligent Sensor Networks 


. J. Schneider and T. Kamiya (Eds.), Efficient XML interchange (EXI) format 1.0 (W3C Recommenda- 
tion, 10 March 2011), World Wide Web Consortium, 2011. Retrieved from http://www.w3.org/TR/ 
2011/REC-exi-20110310/ 

Binary extensible markup language (BXML) encoding specification (OGC 03-00219), Open 
Geospatial Consortium Inc., C. Bruce (Ed.), 2006. Retrieved from http://portal.opengeospatial. 
org/files/?artifact_id= 13636 

M. Botts (Ed.), Open GIS sensor model language (SensorML) Implementation Specification (OGC 
05-086r2), Open Geospatial Consortium Inc., 2006. Retrieved from http://portal.opengeospatial. 
org/files/?artifact_id=13879 

D. Box, D. Ehnebuske, G. Kakiwaya, A. Layman, N. Mendelson, H.F. Nielsen, S. Tathe, and 
D. Winer, Simple object access protocol (SOAP) 1.1, World Wide Web Consortium, 2000. Retrieved 
from http://www.w3.org/TR/2000/NOTE-SOAP-20000508/ 

A. Sheth, C. Henson, and S.S. Sahoo, Semantic sensor web, [EEE Internet Comput., 12(4), 78-83, 
July 2008. 

M. Botts, G. Percivall, C. Reed, and J. Davidson, OGC Sensor web enablement: Overview and high 
level architecture, in S. Nittel, A. Labrinidis, and A. Stefanidis (Eds.), GeoSensor Networks, 4540, 
175-190, December 2007. 

A. Walkowski, Sensor web enablement—An overview, in M. Grothe and J. Kooijman (Eds.), Sensor 
Web Enablement, 45, 69-72, 2008. 

S. Jirka, A. Bróring, and C. Stach, Discovery mechanism on sensor web, Sensors, 6, 2661-2681, 2009. 
S. Chien et al., Lights out autonomous operation of an earth observing sensorweb, Proc. RCSGSO 07, 
Moscow, Russia, pp. 1-8, June 2007. 

E. Song and K. Lee, Integration of IEEE 1451 smart transducers and OGC-SWE using SIWS, Proc. 
SAS'09, New Orleans, LA, USA, pp. 298-303, February 2009. 

S. Fairgrieve, J. Makuch, and S. Falke, PULSENet™: An implementation of sensor web standards, 
Proc. CTS'09, Baltimore, MD, USA, pp. 64-75, May 2009. 


Chapter 4 


Smart Wireless Sensor Nodes 
for Structural Health 
Monitoring 


Xuefeng Liu, Shaojie Tang, and Xiaohua Xu 


Contents 
Al [IO UC E idas 78 
4.2 Related Works ii dia Ata 80 
4.3 Background: Modal Analysis ........... 0. eee eee 82 
43.1. Modal Parameters: nin aia a aaa alea 0 0 aas 82 
43D). The PRA cani, 83 
4.4 Distributed Modal Analysis ........ 0.0.0 ccc cece eee e ee eee eee eee e nese teat ene ennes 85 
4.4.1 Stage 1: Try to Distribute the Initial Stage of Modal Analysis Algorithms ......... 85 
4.4.2 Stage 2: Divide and Conquer............. eee eee nenea tenes 86 
4.5 WSN-Cloud SHM: New Possibility toward SHM of Large Civil Infrastructures ......... 88 
References id 89 


Because of the low cost, high scalability, and ease of deployment, wireless sensor networks (WSNs) 
are emerging as a promising sensing paradigm that the structural engineering field has begun to 
consider as a substitute for traditional tethered structural health monitoring (SHM) systems. For 
a WSN-based SHM system, particularly the one used for long-term purpose, to provide real- 
time information about the structure’s healthy status and in the meantime, to avoid high cost 
of streaming the raw data wirelessly, embedding SHM algorithms within the network is usually 
necessary. However, unlike other monitoring applications of WSNs, such as environmental and 
habitat monitoring where embedded algorithms are as simple as average, max, min, etc., many 
SHM algorithms are centralized and computationally intensive. Implementing SHM algorithms 
within WSNs hence becomes the main roadblock of many WSN-based SHM systems. This chapter 
mainly focuses on designing and implementing effective and energy-efficient SHM algorithms in 
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resource-limited WSNs. We first give a summary review of the recent advances of embedding SHM 
algorithms within WSNs. Then the modal analysis, a classic SHM algorithm widely adopted in 
SHM, is chosen as an example. How this technique can be embedded within a WSN is described 
in a step-by-step manner. At last, we propose a WSN-Cloud system architecture, which we believe 
is a promising paradigm for the future SHM. 


4.1 Introduction 


Civil structures such as dams, long-span bridges, skyscrapers, etc., are critical components of the 
economic and industrial infrastructure. Therefore, it is important to monitor their integrity and 
detect/pinpoint any possible damage before it reaches to a critical state. This is the objective of 
SHM [1]. 

Traditional SHM systems are wire-based and centralized. In a typical SHM system, different 
types of sensors, such as accelerometers, and strain gauges, are deployed on the structure under 
monitoring. These sensor nodes collect the vibration and strain of the structure under different 
locations and transmit the data through cables to a central station. Based on the data, SHM 
algorithms are implemented to extract damage-associated information to make corresponding 
decisions about structural condition [1]. 

According to the duration of deployment, SHM systems can be largely divided into two 
categories: short- and long-term monitoring. Short-term SHM systems are generally used in routine 
annual inspection or urgent safety evaluation after some unexpected events such as earthquake, 
overload, or collisions. These short-term systems are usually deployed on structures for a few hours 
to collect enough amounts of data for off-line diagnosis afterward. Examples of short-term SHM 
systems can be found in the Humber Bridge of United Kingdom [2], and the National Aquatic 
Centre in Beijing, China [3]. The second category of SHM systems is those used for long-term 
monitoring. Sensor nodes in these systems are deployed on structures for months, years, or even 
decades to monitor the structures’ healthy condition. Different from short-term monitoring systems 
where data are processed off-line by human operators, most of the long-term SHM systems require 
the healthy condition of the structure be reported in a real-time or near real-time manner. Examples 
of long-term monitoring SHM systems can be found in the Tsing Ma Bridge and Stonecutters 
Bridge in Hong Kong [4]. 

The main drawback of traditional wire-based SHM systems is the high cost. The high cost 
mainly comes from the centralized data acquisition system (DAC), long cables, sensors, and in- 
field servers. Particularly for DAC, its price increases dramatically with the number of channels it 
can accept. As a result, the cost of a typical wire-based SHM system is generally high. For example, 
the costs of the systems deployed on the Bill Emerson Memorial Bridge and Tsing Ma Bridge reach 
$1.3 and $8 million, respectively [4]. 

In addition, deploying a wire-based SHM system generally takes a long period of time. This 
drawback is particularly apparent in SHM systems used for short-term purpose. Considering the 
length of cables used in an SHM system deployed on a large civil infrastructure can reach thousands 
or even tens of thousand meters, deployment can take hours or even days to obtain measurement 
data just for a few minutes. Moreover, constrained by the number of sensor nodes and the capability 
of DAC, it is quite common that an SHM system is repeatedly deployed in different areas of a 
structure to implement measurement. This dramatically increases the deployment cost. We have 
collaborated with civil researches to deploy a wire-based SHM system on the Hedong Bridge in 
Guangzhou, China (see Figure 4.1). The DAC system we used can only support inputs from 
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(a) 


Figure 4.1 A wired-based SHM system deployed on the Hedong Bridge, China: (a) Hedong 
Bridge and (b) deploying a wired system. 


seven accelerometers simultaneously. To measure the vibration at different locations across the 
whole bridge, the systems were hence moved to 15 different areas of the bridge to implement 
measurement, respectively. For each deployment, it took about 2 h for sensor installation, cable 
deployment, and initial debugging. 

Recent years have witnessed a booming advancement of WSNs and an increasing interest of 
using WSNs for SHM. Compared with the traditional wire-based SHM systems, wireless com- 
munication eradicates the need for wires and therefore represents a significant cost reduction and 
convenience in deployment. A WSN-based SHM system can achieve finer grain of monitoring, 
which potentially increases the accuracy and reliability of the system. 

However, SHM is different in many aspects from most of the existing applications of WSNs. 
Table 4.1 summarizes the main differences between SHM and a typical application of WSNs, 
environmental monitoring, in terms of sensing and processing algorithms. Briefly speaking, sensor 
nodes in an SHM system implement synchronized sensing with relatively high sampling frequency. 
Moreover, SHM algorithms to detect damage are based on a bunch of data (in a level of thousands 
and tens of thousands) and are usually centralized and complicated. 


Table 4.1 Difference between SHM and Environmental Monitoring 


SHM Environmental Monitoring 
Sensor type Accelerometers, strain gauges | Temperature, light, humidity 
Sampling pattern Synchronous sampling round | Not necessarily synchronized 
by round 
Sampling frequency X00 — X000/s X/s,min 


Processing algorithms | On a bunch of data (>X0000) Simple, easy to be distributed 
centralized, computationally 
intensive 
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Moreover, the difficulty of designing a WSN-based SHM system is different for short- and long- 
term applications. Designing a WSN for short-term SHM is relatively easy. Generally speaking, 
short-term SHM systems only need to address synchronized data sampling and reliable data 
collection. The former task can be realized using various time synchronization protocols [5] and 
resampling techniques [6]. In addition, considering the high cost of wireless transmissions in 
WSNs, wireless sensor nodes in a short-term SHM system can be equipped with local storage 
device, such as a USD card or USB to save the measured data in a real-time manner. The locally 
stored data in wireless sensor nodes can be retrieved afterward by human operators. 

On the contrary, designing a WSN for long-term SHM is much more challenging. A long-term 
SHM system not only needs to have a longer system lifetime and higher system reliability but 
embedding SHM algorithms within the network becomes a necessity. This task is difficult mainly 
due to the following two factors: 

First, although there exist some SHM algorithms that are intrinsically distributed, most of the 
traditional SHM algorithms are centralized, which means that their implementation requires the 
availability of the raw data from all the deployed sensor nodes. However, considering the high 
cost of transmitting raw data in a wireless environment, it is desirable that deployed wireless sensor 
nodes use their local information only or at most, exchange the information only with their nearby 
neighbors. To distribute these centralized SHM algorithms is a challenging task. 

Moreover, different from many applications of WSNs where simple aggregation functions 
such as average, max, min, etc. are widely used, most of the classic SHM algorithms involve 
complex matrix computational techniques such as singular value decomposition (SVD), eigen 
value decomposition, as well as other time domain or frequency domain signal processing methods. 
Some of these algorithms can be computationally very intensive and require a large auxiliary 
memory space for computation. For example, it was reported in [6] that implementing the SVD 
on a small 48-by-50 data matrix that includes data only from a few sensor nodes would take 
150s in Imote2 running at 100 MHz. Further, considering the time complexity of SVD on a 
data matrix H € R”*” is O(n?) [7], the SVD on an H including data from a large number of 
sensor nodes is essentially infeasible for most of the available off-the-shelf wireless sensor nodes. 
How to modify these resource-consuming SHM algorithms and make them lightweight is also a 
challenging task. 

In this chapter, we target the WSN-based SHM systems used for long-term monitoring purposes 
and mainly focus on how to design and implement SHM algorithms in resource-limited wireless 
sensor nodes. We first give a summary review of the recent efforts of embedding SHM algorithms 
within the WSNs. We then select an SHM algorithm that is widely used in civil engineering field, 
modal analysis, and describe how to implement it within WSNs. 


4.2 Related Works 


What distinguishes WSNs from traditional tethered structural monitoring systems is that the 
wireless sensor nodes are “smart” and able to process the response measurements they collected or 
received from others. Autonomous execution of damage detection algorithms by the wireless sensor 
represents an important step toward automated SHM. 

There exist a number of SHM algorithms developed by civil engineers and they have shown 
advantages in different structures and environmental conditions. However, based on the difficulties 
to be implemented in a typical WSN, they can be largely divided as (1) inherently distributed and 
lightweight, (2) inherently distributed but computationally intensive, and (3) centralized. 
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Some SHM algorithms can be implemented on a WSN directly without any modification. These 
algorithms share two properties: (1) They are inherently distributed, which means that each sensor 
node, based on its own measured data, can make a decision on the condition of the structure; and (2) 
the complexity of the algorithms is low. For example, to detect damage, some SHM algorithms rely 
on examining the change of a vibration characteristic called natural frequencies. Natural frequency 
is a global parameter of structures, and under some assumptions on the input excitation and envi- 
ronmental noise they can be estimated based on time history data from each sensor node [8]. One 
rough but simple approach to extract natural frequencies is peak-picking [9]. In the peak-picking 
method, the power spectral density (PSD) of the measured time history from a sensor node is cal- 
culated using the fast Fourier transform, and then the some “peaks” on the PSD are selected whose 
locations are selected as the identified natural frequencies. In a WSN-based SHM system imple- 
menting this strategy, any wireless sensor node is able to identify a set of natural frequencies using 
peak-picking without sharing data with each other. The peak-picking method itself is lightweight. 
However, a drawback of using this peak-picking method is that it can only give approximate 
estimation of natural frequencies. An example of such a WSN system can be found in [10]. 

Some SHM algorithms, although inherently distributed, cannot be directly implemented in 
a WSN due to the high computational complexity and large memory space required. Examples 
of these algorithms include the auto-regressive and auto-regressive exogenous inputs (AR-ARX) 
method [11], the damage localization assurance criterion (DLAC) method [12,13], and the wavelet 
method [14,15]. The AR-ARX method is based on the premise that if there was damage in a 
structure, the prediction model previously identified using the undamaged time history would not 
be able to reproduce the newly obtained time series. In the AR-ARX method, a sensor node (1) 
first identifies an AR model based on its collect data and (2) then searches through a database that 
stores the AR models of the structure under a variety of environmental and operational conditions 
to find a best match and then based on which, (3) identifies an ARX model to obtain the decision 
on the healthy status. Except for the task in the first stage, the last two tasks are computationally 
intensive and require large memory space. To address this problem, Lynch et al. [16] modified the 
AR-ARX method and made it applicable for WSNs. The basic idea is very simple: After a sensor 
node identifies its AR model, it will send the corresponding parameters to a central server and let 
the server finish the two remaining cumbersome tasks. 

The DLAC method is also a distributed SHM algorithm. In the DLAC, each sensor node collects 
its own data, calculates its PSD, identifies natural frequencies, and obtains damage information 
by comparing identified natural frequencies with the reference ones. In the DLAC, the natural 
frequencies are identified using the rational fraction polynomial (RFP) method [17] instead of 
the aforementioned peak-picking method, since the RFP can provide more accurate estimation. 
However, implementing the DLAC is much more time consuming than the peak-picking and 
hence not applicable for most of the off-the-shelf wireless sensor nodes. To address this problem, 
the DLAC is tailored for WSNs and within which, the most time-consuming task of the DLAC, the 
REP is offloaded to a central server. After the server has finished the RFP, the natural frequencies 
are transmitted back to the sensor nodes for the remaining tasks. 

The wavelet transform (WT) or the wavelet packet transform (WPT) of the time histories 
collected from individual sensor nodes have also been used for damage detection [14,15]. Wavelet- 
based approaches are based on the assumption that the signal energy at some certain frequency 
spectrum bands extracted from the WI/WPT will change after damage. However, traditional WT 
and WPT are computational intensive, requiring large auxiliary memory space and thus are not 
suitable for WSNs. To address this problem, the lifting scheme wavelet transform is proposed in 
[18], which has the advantages of fast implementation, fully in-place calculation without auxiliary 
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memory, and integer-to-integer mapping. This modification on WT and WPT has proven to be 
very effective to improve the efficiency of WSNs using WT/WPT to detect damage. 

Different from the distributed SHM algorithms mentioned earlier by which decision can be 
made based on data from individual sensor nodes, a large percentage of SHM algorithms are 
centralized. They require the raw data from all the deployed sensor nodes. Embedding centralized 
SHM algorithms within a WSN is not an easy task. In this chapter, we give an example of how 
centralized SHM algorithms can be made distributed in a WSN. Since there exist a large variety 
of algorithms for SHM that have been proposed by civil engineers, we select one technique called 
modal analysis. Modal analysis is one of the most fundamental techniques in SHM. Using modal 
analysis, structural vibration characteristics, called as the modal parameters, are identified that will 
in turn give damage-associated information. 


4.3 Background: Modal Analysis 


In this section, we first give some basic background associated with modal analysis and then 
describe, in a step-by-step manner, how this can be embedded within wireless sensor nodes. 


4.3.1 Modal Parameters 


Every structure has tendency to oscillate with much larger amplitude at some frequencies than oth- 
ers. These frequencies are called natural frequencies. (This concept was mentioned in Section 4.2.) 
When a structure is vibrating under one of its natural frequencies, the corresponding vibrational 
pattern it exhibits is called a mode shape for this natural frequency. 

For example, for a structure with n-degrees of freedom (DOFs), its natural frequency set and 
mode shapes are denoted, respectively, as: 


ESPA a7 (4.1) 
h! 7 e A 


p! 3 = n 
a . (4.2) 


lot 42 on] 


where 
fF (R=1,...,7) is the Ath natural frequency 
Wk(=1,..., 7) is the mode shape corresponding to f% 
hi = 1,2,...,7) is the value of WE at the ith DOF 


For convenience, f* and WE are also called modal parameters corresponding to the kth mode of a 
structure. As an example, Figure 4.2 illustrates the first three mode shapes of a typical cantilevered 
beam, extracted from the measurements of the deployed 12 sensor nodes. Each mode shape 
corresponds to a certain natural frequency of this cantilever beam. 

Modal parameters are determined only by the physical property of structure (i.e., mass, stiffness, 
damping, etc.). When damage occurs on a structure, its internal property will be changed, and 
consequently, modal parameters will be deviated from those corresponding to this structure in the 
healthy condition. Therefore, by examining the changes in these modal parameters, damage on the 


Smart Wireless Sensor Nodes for Structural Health Monitoring m 83 


Figure 4.2 Mode shapes of a typical cantilevered beam: (a) original beam, (b) mode shape 1, 
(c) mode shape 2, and (d) mode shape 3. 


structure can be roughly detected and located. Modal parameters can also be used as the inputs for 
finite element model (FEM) updating [19], which is able to precisely locate and quantify structural 


damage. 
It should also be noted that different from natural frequency vector f, mode shape vector 
has an element corresponding to each sensor node. Moreover, elements in * only represent the 


relative vibration amplitudes of structure at corresponding sensor nodes. In other words, two mode 
shape vectors and J are the same if there exists a nonzero scalar Č, which satisfies E = (J, 
This property leads to one of the important constraints when designing distributed modal analysis. 
Details about this constraint will be given in the next section. 

To identify modal parameters, civil engineers have developed a larger number of classic modal 
analysis algorithms including stochastic subspace identification (SSI) [20], the eigensystem realiza- 
tion algorithm (ERA) [21], the frequency domain decomposition [22], and the enhanced frequency 
domain decomposition [23]. In this chapter, we choose the ERA for modal parameter identification 
and briefly introduce how the modal parameters are identified using the ERA. 


4.3.2 The ERA 


In this section, we briefly introduce the ERA. The ERA is able to give accurate modal parameter 
estimate using output data-only and has been widely used by civil engineers for many years. 

Assume a total of m sensor nodes are deployed on a structure and the collected data are denoted 
as y(k) = pl (hk), y? (A), YU] (k = 1,..., Nori), where y (k) is the data sampled by the ith 
sensor at Ath time step and N,,; is the total number of data points collected in each node. To obtain 
modal parameters, the ERA first identifies, from measured responses y(£), a series of parameters 
Y (£) called Markov parameters. The Markov parameters Y (£) are calculated as the cross-correlation 
function (CCF) of the measurement y and a reference signal yf: 


CCF yy (k) 
C CE pys (k) 


Y(k) = CCF y wef (k) = (4.3) 


| CCF mys (| 


where CCF ¡yr is the CCF between the ith measurement yi and the reference pf. Generally 


speaking, measured signal from any deployed sensor node can be selected as y". To accurately 
estimate CCF ep» we first use the Welch’s averaged periodogram method [24] to calculate the 
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cross-spectral density (CSD) between y and y"F, then inverse fast Fourier transform (ifft) is 
implemented on the CSD to obtain the CCF. 

In the Welch’s method, to calculate the CSD of two signals x and y, x and y are first divided 
into nq number of overlapping segments. The CSD of x and y, denoted as Gx, is then calculated as 


LE 
Gyw) = zy XP) Yw) (4.4) 
i=l 


where 
X;(w) and Y;(w) are the Fourier transforms of the ith segment of x and y 
“ denotes the complex conjugate 


N is data points in each segment of x (or y) as well as the obtained G,,(w). N is generally taken as 
a power of two values 1024 or 2048 to give reasonable results. To decrease the noise, ng practically 
ranges from 10 to 20. 
After obtaining the CSD off with each response in y, the Markov parameters Y (%) are then 
calculated as the ifft of the obtained CSD: 
HG ye) 
HG per) 
Y(k) = l (k) (4.5) 
ifft(Gynyref) 


Having obtained the Markov parameters Y(1), Y (2), ..., the ERA begins by forming the Hankel 
matrix composed of these Markov parameters and implement the SVD to obtain modal parameters. 
The detailed procedure of the ERA is summarized in Figure 4.3. It can be seen that the ERA can 
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Figure 4.3 Procedures of the ERA algorithm. 
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be largely divided into two stages. In the first stage, the Markov parameters are identified. These 
Markov parameters are then used to identify the modal parameters in the second stage. 

In the following three sections, we will introduce, in a step-by-step manner, how these centralized 
modal analysis algorithms are tailored for WSNs. 


4.4 Distributed Modal Analysis 


4.4.1 Stage 1: Try to Distribute the Initial Stage of Modal Analysis 
Algorithms 


To design distributed version of centralized modal analysis algorithms, the detailed procedures in 
the ERA should be analyzed. It can be seen from Figure 4.3 that the CSD estimation between 
the time history of a reference sensor and that of each sensor is first calculated. Therefore, if the 
CSDs can be calculated in a way suitable for WSNs, the efficiency of these algorithms in a WSN 
can be significantly improved. Nagayama and Spencer [6] proposed a decentralized approach 
illustrated in Figure 4.4a to calculate the CSDs without necessitating the collection of all the 
measured data. In this strategy, the reference node broadcasts its measured time history record to 
all the remaining nodes. After receiving the reference signal, each node calculates a CSD estimation 
and then transmits it back to a sink node where the remaining portions of the algorithms are 
implemented. Considering the amount of data in the CSDs is much smaller than the original time 
history record, the amount of transmitted data in this approach is much smaller than the traditional 
one where all the raw data are transmitted to the sink. Moreover, part of the computation load that 
was concentrated at the sink node (i.e., the one responsible for calculating the CSD) is partially 
off-loaded to the other nodes, which is favorable for a homogeneous WSN in which no “super 
nodes” exist in the network. 

This decentralized approach is further improved in [25] where the decentralized random decre- 
ment technique (RDT) is adopted to calculate the CSDs. With the help of the RDT, the reference 
node does not even need to send all the measured time history record, only some trigger points 
in the time history found by the RDT need to be broadcast. Once the trigger points are received, 
each node calculates the CSDs that are subsequently collected at the sink node to continue the 


Figure 4.4 Two approaches of calculating the CSDs in a distributed way: (a) the approach 
proposed in [6] and (b) the approach proposed in [25]. 
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remaining damage identification procedures. Considering the trigger information is in general 
much shorter than the time history record broadcast by the reference node, this RDT-based decen- 
tralized approach can considerably reduce wireless data transmissions. This approach is illustrated 
in Figure 4.4b. 


4.4.2 Stage 2: Divide and Conquer 


If only the CSD estimations in the modal analysis algorithms are made to be distributed, there 
remain some problems since the CSDs of all the nodes still need to be wirelessly transmitted to 
a sink where the remaining steps of the ERA are finished. First, transmitting the CSDs of all the 
sensor nodes to the sink is a challenging task considering the CSD of each node contains thousands 
of points that are usually in a double-precision floating-point format. In addition, in a large civil 
structure, the CSDs usually need to be transmitted in a multi-hop manner, which considerably 
downgrades the performance of the system. The second problem is associated with computation. 
When the sink node receives the CSDs for the deployed sensor nodes, the computational resources 
required to identify modal parameters usually exceed the capacity of the most existing off-the-shelf 
wireless sensor nodes, especially when the number of sensor nodes is large. Therefore, a PC is 
generally used as the sink node that can increase the system cost and difficulties in deployment. 

To address the aforementioned problems, instead of using data from all the sensor nodes in a 
batch manner, we can divide the deployed sensor nodes into clusters and implement the ERA in 
each cluster. We then obtain a set of natural frequencies and “local” mode shapes, and these cluster- 
based modal parameters will be “merged” together afterward. This is very similar to the “divide and 
conquer” strategy widely adopted by computer scientists to solve various mathematical problems. 
A minor difference might be that in the original “divide and conquer” algorithms, the original 
problem is solved by dividing the original problem in a recursive way, while in this cluster-based 
ERA, the division of a WSN needs to be carried out only once. 

This cluster-based ERA is illustrated in Figure 4.5. In this approach, the whole network is 
partitioned into a number of clusters. A cluster head (CH) is designated in each cluster to perform 
intra-cluster modal analysis using traditional centralized modal analysis algorithms. The identified 
modal parameters in all clusters are then assembled together to obtain the modal parameters for the 
whole structure. Compared with the centralized approach, the cluster-based approach has at least 
two advantages. The first advantage of this cluster-based approach is associated with the wireless 
communication. By dividing sensor nodes into single-hop clusters in which sensor nodes in each 
cluster are within single-hop communication with their CH, we can avoid multi-hop relay and 
thus reduce the corresponding wireless communications. 

Second, compared with the centralized approach, the computational resources required in each 
cluster to compute the modal parameters is significantly decreased. By reducing the computational 
complexity, it is possible to use common wireless sensor nodes instead of PC to implement modal 
analysis algorithms. 

The third advantage of this approach is that by dividing sensor nodes into clusters, the compu- 
tation of the ERA can be made parallel. All the CHs can work at the same time, thus the overall 
computation time is decreased. 

However, clustering must satisfy some constraints. First, clusters must overlap with each other. 
This constraint is a prerequisite for the local mode shapes to be stitched together. As we have 
introduced in Section 4.3, mode shape vectors identified using the ERA only represent the relative 
vibration amplitudes at sensor nodes involved. Therefore, mode shapes identified in different 
clusters cannot be directly assembled. This can be demonstrated in Figure 4.6a, where the deployed 
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Figure 4.6 Mode shape assembling when (a) clusters do not overlap and (b) clusters overlap. 


12 sensor nodes in Figure 4.2 are partitioned into three clusters to identify the third mode shape. 
Although the mode shape of each cluster is correctly identified, we still cannot obtain the mode 
shapes for the whole structure. The key to solve this problem is overlapping. We must ensure that 
each cluster has at least one node which also belongs to another cluster and all the clusters are 
connected through the overlapping nodes. For example, in Figure 4.6b, mode shapes identified 
in each of the three clusters can be assembled together with the help of the overlapping nodes 
5 and 9. This requirement of overlapping must be satisfied when dividing sensor nodes into 
clusters. 

Another constraint is the number of sensor nodes in a cluster. To avoid the under-determined 
problem, the ERA also requires that the number of sensor nodes in each cluster should be larger 
than the number of modal parameters to be identified. 

Given a WSN, different clustering strategies will generate clusters with different sizes and 
network topologies and therefore can result in different energy consumption, wireless bandwidth 
consumed, delay, etc. Correspondingly, clustering can be optimized according to different objective 
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functions that may vary for different hardware, wireless communication protocols, and other specific 
scenarios of WSN-based SHM systems. For example, for a WSN in which wireless sensor nodes 
are battery powered, energy efficiency is an important issue. Therefore, how to divide the deployed 
sensor nodes such that the energy consumption is minimized is important. This optimal clustering 
problem is studied in [26]. Besides energy consumption, other possible objective functions for 
clustering can be wireless transmissions, load-balance, delay, etc. 


4.5 WSN-Cloud SHM: New Possibility toward SHM of Large 
Civil Infrastructures 


In the SHM algorithms described earlier, we introduced how to use wireless sensor nodes to estimate 
modal parameters. However, to obtain damage location and further quantify damage severity, we 
generally still have one step to go. The estimated modal parameters will be sent to a server where the 
FEM of the structure under monitoring is updated. This procedure is called model updating [19]. 
In FEM updating, parameters of the structure’s FEM, which are directly associated with the physical 
property of the structure, are adjusted to reduce a penalty function based on residuals between the 
modal parameters estimated from measured data and the corresponding FEM predictions. The 
updated FEM directly provides information of damage location and severity. However, for a large 
civil infrastructure where an accurate FEM can contain tens of thousands or even hundreds of 
thousands of small “structural elements,” model updating is extremely resource demanding and 
can take hours or even days even for a powerful PC. 

To alleviate the computational burden of the server as well as to decrease the associated delay, 
civil engineers have proposed a scheme called a multi-scale SHM [27,28]. In this strategy, two 
different FEMs, one coarse and one refined, are established for a given structure. The former FEM 
consists of smaller number of large-sized structural elements and the latter contains small-sized 
but large number of elements. Correspondingly, the updating of the coarse FEM takes much less 
computation time than the latter. Initially, estimated modal parameters are used to update the 
coarse FEM. Only when damage is detected on this coarse FEM, the refined FEM is updated 
for the detailed damage localization and quantification. This multi-scale strategy can significantly 
decrease the computational load for the server. Moreover, in this strategy, the updating of the 
coarse FEM only requires the “coarse” modal parameters, whose identification does not need all 
the deployed sensor nodes. Therefore, it is possible that only part of the deployed sensor nodes 
need to work. This can increase the lifetime of the WSN. 

However, the server of the SHM systems using this multi-scale strategy still needs to be powerful 
enough to handle the task of updating the refined FEM when the damage is suspected to occur. 
Considering most of the time, the server is running coarse-FEM updating where the computational 
load is low; it is a waste to purchase a powerful server that is under-loaded most of the time. 

Cloud computing, being able to provide dynamically scalable resources, can be a perfect 
substitute for the server used in the aforementioned SHM system. Instead of purchasing a powerful 
server, we can buy the computational resources from cloud provider and only pay for the resources 
we have used. This “pay as you go” business pattern can dramatically reduce the total cost of SHM 
systems. The property of dynamic scaling of multi-scaled SHM makes cloud computing a perfect 
platform in this application. 

A future SHM system is envisioned as shown in Figure 4.7. A large number of wireless sensor 
nodes are deployed on different locations of the structure under monitoring, and a gateway node, 
serving as in-field commander, is able to communicate with both the WSN and the Internet. 
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Figure 4.7 Architecture of WSN-Cloud SHM. 


Initially, under the command of the gateway node, part of the wireless sensor nodes are activated to 
sample and compute the modal parameters. The modal parameters are sent to the gateway and are 
then forwarded to the cloud servers where the model updating is implemented. The updating will 
be transmitted back to the gateway. If damage is not detected, the aforementioned procedures are 
repeated for every predetermined period of time. Once damage is found on the coarse FEM, more 
wireless sensor nodes will be activated by the gateway nodes and a refined modal parameters will 
then send from the gateway to the cloud side to implement FEM updating on the refined FEM. 
We call this hybrid architecture as “WSN-Cloud SHM.” 

This leaves much space for us to explore and to realize a practical WSN-Cloud SHM system. 
For example, most of the existing applications of Cloud computing, particularly web-based appli- 
cations, can use “MapReduce” programming model [29]. However, different from web-associated 
applications such as text tokenization, indexing, and search, implementing FEM updating in the 
form of MapReduce is not straightforward and needs in-depth investigation. Moreover, besides 
the cloud user point of view, how to provide different levels of cloud-based SHM services for 
infrastructure owners is an interesting question. The answers to these questions will lead to great 
economic benefit in the future. 
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In this chapter, we propose a new approach for the design of resilient sensor networks using a 
knowledge representation (KR) formalism that will facilitate reasoning and collaboration among 
sensors. This approach utilizes abstract simplicial complexes (ASCs), which serve as the building 
blocks of this KR, permitting the encoding of more complex relationships than models based 
on graph theory due to their additional structure. These mathematical constructions exist in the 
framework of combinatorial algebraic topology (CAT), which uses discrete methods to formalize 
how algebraic equations relate to the structure of a space. Its algebraic nature allows formulation of 
algorithms and automation of decision making. Electrical power systems are a natural choice for an 
application of this KR because of their large size, distributed information, and complex dynamics. 
The developed approach can be used to design resilient sensor networks that can assist the power 
management, control, and monitoring systems in inferring and predicting the system state, as well 
as determining the health of all components, including sensors. This approach is illustrated on an 
IEEE 14 bus power system. 


5.1 Introduction 


Sensing and measurement of large-scale system variables such as weather conditions (wind speed, 
temperature, etc.), power flow, and market prices are essential for improving the monitoring 
and control of engineered systems. This information can be gathered by means of a distributed 
heterogeneous network of sensors. The heterogeneous nature of knowledge (e.g., electrical, weather, 
etc.) results in significant challenges involving control and monitoring systems. Usually, engineers 
design the sensing and data processing for control and monitoring by considering one particular 
type of knowledge for a given physical domain. Advanced sensing techniques will collect valuable 
information about the system conditions throughout the network for multiple physical domains. 
By considering these multiple forms of knowledge in concert, one can improve problem solving. 

Using new communications such as the Internet and reliable media such as wireless, broadband 
power lines, or fiber optics, a network of distributed sensors will have the ability to communicate 
quickly in order to infer valuable information. For example, in large-scale systems, sensors will 
have the ability to interact and coordinate with control and monitoring systems for optimal system 
control and decision making. 

While the study of sensor network design for the state estimation problem is fairly widespread 
[1,13], their use in distributed, large-scale systems for managing complexity and uncertainty, as 
well as real-time decentralized control and monitoring is seen as an increasingly important avenue 
of research. As such, our objective in this research is to develop tools that assist sensors to infer 
the system state, detect and diagnose failures, as well as to learn and adapt to their changing 
environment. In order to achieve these objectives, we introduce an approach based on a sensor 
management system (SMS). This SMS will be able to quickly collect important data and process 
it as well as monitor the health of sensors and restore erroneous or missing data. This system will 
inform (1) the controller about the state of the system, and (2) the monitoring system about the 
health of sensors and assist decisions. By optimizing the communication pathway between sensors 
and the control or monitoring system, as well as minimizing the number of sensors required for 
performing a task, we can improve system resilience and avoid problems, such as high maintenance 
cost, as well as latency and congestion. 

KR is the key to the design of an intelligent SMS because it provides the basic cognitive 
structure of reasoning and machine learning. This can enable an SMS to predict possible outcomes 
and required actions from its perception of a situation. KR and reasoning is the study of thought 
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as a computational process. It is concerned with how a system uses its knowledge to infer solutions 
and make decisions. While logical reasoning produces answers to problems based on fixed rules, 
analogical reasoning produces conclusions by extending a known comparison, strategy, or concept 
in one domain called the source, to another domain, called the target. Analogies can help gain 
understanding of new concepts by comparing them with past experiences. There are many references 
such as [3,7—10] that discuss the theory of analogical reasoning in the context of the human mind. 
Hadamard in [10] suggests that human creative insight comes from the application of an analogy 
from a different domain of knowledge, and Collins in [3] explores its role in individuals with 
incomplete knowledge of the source domain. However, most work done to represent analogical 
reasoning focuses on graph-theoretic representations which are limited in the dimension of allowed 
relations. 

The goal of this chapter is to develop tools for analogical reasoning and problem solving 
using CAT. We present a method to form solution strategies as well as make analogies, applying 
successful strategies to new situations. These methods use ASCs as a formalism for KR. An ASC is 
an extension of a graph, where the idea of an edge is generalized to arbitrary dimensions. This allows 
much richer structures such as the mutual adjacency of any number of vertices. Models based on 
simplicial complexes can encode more complex relationships than the traditional graph-theoretic 
models due to their additional structure, making ASCs a good choice for the building blocks of 
an analogy-capable KR. In [5], the authors developed a computer model for analogy solving using 
simplicial complexes. The source and the target analogs are represented as simplices and the analogy 
is modeled as topological deformations of these simplices along a chain sequence according to a set 
of rules. However, this proposed approach was adapted to the specific problem of solving IQ test 
questions, and was not applied to general intelligent systems. In order to apply this approach to 
engineering systems we need to develop a basic mathematical language of reasoning. This will be 
used to (1) model the knowledge collected about the system (e.g., mathematical constraints, etc.), 
(2) use the KR technique, and (3) develop reasoning process using analogies. The reasoning process 
will then analyze system conditions, build strategies for communication and interaction between 
sensors, and initiate necessary actions in the case of missing data or loss of sensors. The benefit of 
the proposed sensing systems offers the ability of fast data collection, processing, and reliability and 
resilience in the presence of failures. This new intelligent sensing system promise to increase the 
system resilience and improve performance of system monitoring and control. 


5.2 Background 


In this section, we present an overview and background of the existing mathematical tools applied 
in this chapter. Simplicial complexes form the backbone of the proposed knowledge structure. We 
first define some important concepts that involve ASCs and relations [4], and then discuss how 
ASCs can been used to represent knowledge [5]. 


5.2.1 Simplicial Complexes 


Simplicial complexes are abstract structures developed in algebraic topology [14]. Geometrically, 
simplices are an extension of triangles and tetrahedra to arbitrary dimensions. 


Definition 5.1 (Simplex, face, facet) An m-simplex S= {x1,...,%m+1} is the smallest convex 
set in R” containing the given set of m + 1 points (Figure 5.1). A simplex formed from a subset 
of the points of S is called a face of S, and a maximal simplex under inclusion is called a facet. 
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Figure 5.1 m-simplexes for 0 < m < 3. 


Simplices may be glued together along their faces. Such a geometric structure is called a simplicial 
complex A if every face of every simplex is also considered to be a simplex in A. A simplicial 
complex that is contained in another is called a sub-complex. However, we are only interested in 
the combinatorial properties of a simplicial complex. With this in mind, we consider an ASC. 


Definition 5.2 (Abstract simplicial complexes) An ASC is a collection A of subsets of a set of 
vertices V = {x1, ... , Xn}, such that if S is an element of A then so is every nonempty subset of 
S [11,14]. The dimension of an ASC is the dimension of its largest facet. An ASC is called pure if 
all of its facets have the same dimension [6]. 


5.2.2 Predicate Relations on Simplicial Complexes 


Dowker provides a way to represent binary relations as a simplicial complex [4]. Let R C A x B 
be a binary relation between sets A and B. As a subset of A x B, (a, b) € R means that a is related 
to b by the relation R, that is, aRb. For each fixed 6 € B, A4(B) describes what elements of A are 
related to 6 and vice versa. In this way, two dual simplicial complexes,* A4 (B) and Ag (A), are 
associated to R. This can be taken further by replacing the arbitrary set B with a set of predicates 
P= (pr suo Pa}. This allows us to define R so that (a, p;) € R if and only if a satisfies the predicate 
pi that is, p;(a) is true. A4(P) tells what elements of A satisfy each predicate p; € P, whereas 
Ap(4) tells all the predicates satisfied by each a € A. The use of simplicial complexes to represent 
data is part of a larger theory called Q-analysis [2]. 


Example 5.1: 1-ary predicate 


The belief “Per, Hans, and Leif are male” is represented as follows: 
(Per, male), (Hans, male), (Leif, male) € R 


where male is the predicate “is male.” The simplex (Per, Hans, Leif} is associated to the predicate 
male,t as shown in Figure 5.2. 


* In the rest of the chapter, we drop the reference to the predicate and object sets and just refer to the complex as A. 
From here the simplex will just be referred to as the predicate itself. 
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Figure 5.2 1-ary predicate. 


5.2.3 Path Algebra 


An algebra is a collection of mathematical objects along with specific rules for their interactions. 
Algebras can provide languages in which problems of different types can be discussed. A path 
algebra in graph theory is a set of paths equipped with two binary operations that satisfy certain 
requirements. Path algebras are used to solve problems like finding optimal paths between two 
vertices [12], finding all paths emanating from a source or converging to a target [17], determining 
shortest and longest paths, etc. In [5], the authors applied simplicial complexes to the problem 
of solving IQ test analogy questions by selecting deformations that had fewer steps and preserved 
the most properties, but did not develop a rigorous algebra of these deformations or use them for 
reasoning. We define this algebra by extending path algebras from graph theory to ASC so that it 
can be used to solve problems with analogies in a simplicial KR. 

First, we present some background notions from [12] about path algebras. A path algebra is 
defined to be a set P, with two binary operations V and o that satisfy the following properties: 


m V is idempotent, commutative, and associative. That is, for all a, b, c € P, 


ava=a, 
avb=bva 


(aV b)Vc=avV(bVe). 
m o is associative as well as left and right distributive. For all a, b, c € P, 


(ao b)oc=ao (boc), 
ao(bVc)=(a0b)V (aoc), 


(aVb)oc=(a0c) V (boc). 
m There exist elements €, A € P such that for any a € P, 


eE0A=4=M0€, 
AVa=a=avAi, 
Aoa=A. 


The operation V is called the join and o is called the product. The elements € and A are the units 
of the product and join operations, respectively. By defining the join operator to select the shortest 
path and the product as path concatenation, Manger shows that the closure (Definition 5.4) of the 
adjacency matrix of a graph yields the shortest path from node i to node j in its 4, jth entry [12]. 
The join and product of two matrices is defined in terms of the join and product of paths. 
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Definition 5.3 (Join and product) Let A, B be n x n matrices over P. Define the operations V 
and o on the matrices A, B: 


AVB= [aij Vv bj] 
AoB= [V uon] 
k=1 
The powers of the matrix A are defined inductively with the product operation. 


Definition 5.4 (Stability index and closure) A square matrix A is said to be stable if for some 
integer g, 


q q+1 
Va 2 Va 
k=1 k=1 


The smallest such q is called the stability index of A, and the join of the matrices AF as k ranges 
from 1 to q is called the (weak) closure of A, written A: 


5.3 Tools for Representation 
Our goal is to extend the binary relation used in [4,5] in order to describe relationships that are 
more complex than the sharing of common properties. 

Example 5.2: Necessity of extension 

The belief “Per, Hans, and Leif are brothers” cannot be encoded as in Example 5.1 by just replacing 


the predicate m; the statements “Per is a brother, Leif is a brother, and Hans is a brother” do not 
have the intended meaning. 


Instead, we extend the set A to a product of sets Ay x 47 X ::: X Am and define the predicate 
as follows. 


Definition 5.5 (m-Ary predicate) Let m be the minimal number of variables required to express 
a relationship. We define the m-ary predicate p on m variables to be the map from the product of 
sets Ay X Az X +++ X Anm to the set { True, False}: 


p: Ar XA X:+: X Am > [True, False}. 


Predicates with m = 1 are called properties. 
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Per x Leif 


Per x Hans 


Hans x Leif 


Figure 5.3 Binary predicate. 


Example 5.3: Binary predicates 


Per, Hans, and Leif are brothers. The following beliefs imply the desired belief: 


1. Per and Hans are brothers. 
2. Per and Leif are brothers. 
3. Leif and Hans are brothers. 


Figure 5.3 shows the topological representation of the brotherhood of Per, Hans, and Leif .* 


5.4 Tools for Reasoning 


The use of analogy in problem solving allows an intelligent system to connect different knowledge 
domains via method, strategy, or context to an unfamiliar problem. 

In this section, we develop an algebra for reasoning by analogy by defining a concept similar to 
a path called a facet chain. We discuss how this algebra can be used to find a “best” facet chain, 
and then introduce an algorithm for reasoning. 


5.4.1 Facet Chain Algebra 


We begin by creating some basic terminology about facet chains and define the operations necessary 
for creating an algebra. 


Definition 5.6 (Links and size) Let F be the collection of facets of a finite ASC. We define a 
link to be a set containing two facets L;; = {F;, Fj}. A connected link is a link whose facets are 
adjacent. The size |Z;,;| of the link Z;,; is the number of vertices in the intersection F; Fj. The link 
jl 


; ia gras . : A |L; 
{F; F;} = {F;} = e; is called a trivial link. For convenience, we sometimes write L; y to denote 


the link along with its size. 


Consider an alphabet £ as the set of links of a finite ASC. Then a word over £ is a sequence of 
links; the set of words over £ is denoted £*. 


* Together, the statements exhibit logical dependencies; e.g., (1), (2) > (3). 
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Definition 5.7 (Facet chains and length) We refer to an element IT € £* as a facet chain.* A 
connected facet chain is a facet chain whose links are all connected. The length of a facet chain is 
written as I and is equal to the number of its nontrivial links. We write a facet chain from F; to 
F; as T (ż, j) and the set of facet chains from F; to Fj as Cs. 


Definition 5.8 (Critical size) Let T € £*. We refer to the link in I with the smallest size as the 
critical link of T and its size as the critical size |I |e: 


IP]. = mall ojal: ¿=1,..., DI 1) 


Definition 5.9 (Aspect ratio) Let T' € £*. Define the aspect ratio AR of T to be the ratio of its 
length to its critical size: 


T 


ART) = TI 


If the denominator is zero, define the aspect ratio to be oo. 


Depending on the ASC, there can be many facet chains connecting two facets. We develop the 
chain operations join and product along with a norm on chains to select the desired facet chain. 


Definition 5.10 (Chain norm) Let a be a positive rational number, o € Q*. Define the 
function Pq to assign to every chain T in F* a value in Qt U {oo}: 


Pa T) = x- ART) 


The chain norm allows the join to select chains based on two requirements: short length and 
large critical size. For low aspect ratios, it prioritizes the length requirement while prioritizing the 
size for high aspect ratios. The constant a determines the aspect ratio at which this distinction 
occurs. 


Remark 5.1 (Connected facet chains) The norm Pa (T) is finite if and only if I is connected 
and has finite length. 


Remark 5.2 (F is a metric space) The set of facets F is a metric space with distance between 
two facets defined as d(F;, F) = min(p(T(, TG, j) € Ciy}. 


Now we can define the operations join and product for facet chains. 


* Our definition of a facet chain differs from a chain in algebraic topology, which is defined as a linear combination 
of simplices. 
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Definition 5.11 (Join and product of facet chains) Let T (ż, j), T (k, /) be two facet chains in 
F*. Define the join of T (ż, j) and T (k, /) to be the one with the smallest norm: 


PGs): Pa Gs) < Pal D) 

TR): Pall (k, D) < pas) 

rj): Pall (k, D) = PaT G, j)), 
i< k 


Poj Val = 


Now, we define the product of T (ż, j) and T'(£,/) to be concatenation of the two facet chains 
when the last link of T (4,7) equals the first link of F (k, /): 


a TGp*TRD: j=k 
PG ork D = | A: ee 


The set of chains F* along with the operations o and V form the facet chain algebra. 


5.4.2 Finding the Best Facet Chain 


Recall that Definition 5.3 extends the operations of join and product to square matrices. Applied 
to the aforementioned definitions, the join operation \/7_, T (ż, &) o T (k, j) operates on the set of 
chains in C; ;. From Definition 5.4, the closure of the adjacency matrix can yield useful information 
about paths from one vertex to another. We now define two matrices that allow us to quickly 
compute the best chain between two facets; the first plays a similar role as the adjacency matrix in 
graph theory, and the second encodes the instructions for following the chain. 


Definition 5.12 (Link matrix) Let Z be the square matrix of connected links: 
_ [lel 
si | 


We can easily think of a chain T (ż, j) as transforming the facet F; into F;. However, this 
transformation requires knowledge of all the facets involved. In a situation where a system must 
learn from an analogy between two domains, we might not have this complete knowledge in the 
target domain. By considering a sequence of vertex permutations that correspond to the links in 
the chain T (ż, j), we are able to obtain an explicit sequence of instructions that is independent of 
any facet knowledge. If the size of a link is one less than the dimension of the facets, then there is 
only one possible permutation as each facet has only one vertex that is not in their intersection. If it 
is smaller than this, then there are several permutations that change the facets as desired. However, 
since we are only dealing with combinatorial information, each facet is viewed as a set and sets are 
invariant under ordering. We consider two permutations to be equivalent if they both correspond 
to the same link. Each equivalence class is represented by one of its elements. 


Definition 5.13 (Permutation matrix) To a link L;; between facets F; and Fj, associate the 
permutation 0;; = (4171) (42/2) . . . Gain), where n = |F; \ (F; O F;)| and the vertices 71, ..., in € F; 
and fi, .. -sfn E F,. 

Define the matrix P = [0;,] to be the matrix of these permutations. 
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Figure 5.4 Example ASC with seven facets. 


Proposition 5.1 The i, jth entry of the matrix Tis the chain from F; to F; with the smallest chain 
norm, and the ż, jth entry of Pisa sequence of permutations on vertices that when applied to the 
facet F; will sequentially transform it into F;. The ith column of Z represents all possible best facet 


chains originating at F; and the jth row of Z represents all best facet chains terminating at F; 


Therefore, the matrix Z tells us, for every two facets in the ASC, which facet chain takes the 


shortest number of steps and preserves the most properties during each step; the matrix P tells us 
how to actually perform the corresponding transformation. 


Example 5.4: 


Let A be the simplicial complex shown in Figure 5.4. The matrix Z is given as follows: 


er ta Li 0 Hs lie 0 
Br B2 3 la bs 0 la | 
0 lo 43; € 0 0 L4}; | =r 
l, Lia Liz 0 Le 6 57 | 
Le 0 0 0 les €6 Lez 
L 0 0 L 3 Da Ds L 6 €7 
Similarly, we have the matrix P shown in Equation 5.1: 
p e (14) (1 5)(2 4) stad (28) (2 7)(3 8) 
(41) e (2 5) (2 6)(35) (28)(41) 
(1 5)(2 4) (25) e (3 6) (48)(5 1) ten (4 DG 8) | 
P= a (2 6)(3 5) (6 3) e as ae (47)(6 8) | 
(28) (48)(57) (48)(57) nae e (37) (3 5)(1 7) 
(2 7)(3 8) ee ashe Soa (37) e 
L (47)(38) (4768 (47X68) (35)(17) (15) 
(5.1) 


The elements of P written as three dots are permutations corresponding to nonconnected 
links called teleportations.* These are the links of zero size, which are suppressed for clarity. 


* Rodriguez refers to disconnected paths in graph theory as teleportations, and we extend the terminology 


here [17]. 


Knowledge Representation and Reasoning m 103 


Teleportation chains have potential utility for reasoning with hypothetical knowledge of uncertain 
or future events. 


Remark 5.3 (Norm of teleportations) The norm of a teleportation is always 00. This can be 
easily seen as the critical size of a teleportation is zero. Since teleportations have infinite norms, 
they are never selected by the join when a connected link is available. 

There can be many facet chains connecting a pair of facets. For example, P',(1, 4), ,(1, 4) are 
two facet chains* connecting F; to F4: 


_ 72 y2 y2 
Ta = LL 3 13,4 


2 1 
P, = Li2 Lo 4 


Since the aspect ratios of the two facet chains are greater than one, the chain norm for o = 1 gives 
P g g 
priority to the size of the critical link. The corresponding chain norms are shown as follows: 


01 (Ta) = 


SES 


Pı) = 


The chain T has the smallest chain norm of the three, with length three and norm two, whereas 
P, has length two and norm one. This yields p1 (T4) < p1(P¿), so Fa VP, = Ta. 


5.4.3 Reasoning Algorithm 


The reasoning process starts with the ASC representing the target domain called the target complex. 
The target complex is where the problem statement can be represented. The system may not have 
complete knowledge of the target domain and may only know the initial target facet and a few 
vertices of the final target facet. The next step is to find a suitable source complex that is rich enough 
to contain strategies relevant to the problem. The answers to questions posed in the target complex 
should then be inferred from reasoning in the source complex (see Figure 5.5). The problem of 
analogy-finding can be formulated as follows: the source complex, initial target facet, and the 
correspondence p between the two complexes are known, as well as a partial vertex set of the final 
target facet; we want to determine the best facet chain that connects the initial target facet with a 
final target facet containing the partial vertex set. In this work, we assume ¢ is a bijection from the 
target complex to a subset of the source complex. 

In practice, « can be chosen based on the specifics of the ASC. For example, choosing a to be 
smaller than the aspect ratios of all facet chains in the system would result in reasoning that prefers 
keeping as many connections as possible at the expense of using more steps. This reasoning is more 
cautious in a sense. Selecting « to be large would result in more hasty reasoning that prefers quick 
solutions with few steps, at the expense of maintaining fewer connections. 


* From here, we drop the reference to the facets Fy and F4. 
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Figure 5.5 The inverse map 4”! sends the solution T from the source to the target. 


The analogy-finding algorithm consists of four steps*: 


1. Map initial target facet and the partial final target vertex set, respectively F/ and AS , to the 


initial source facet and partial final source vertex set, respectively: 
(F) =F 
p (v; 7) — Vi S 
2. Find the set of facet chains C 7 that start at F$ and end at a facet containing Vp. This is 


done by taking entries of the ith column of Z whose last facets contain Vp. The join of the 


facet chains in Z is the desired facet chain ae , Fe Jes. 
3. Map the sequence of permutations corresponding to the chain from Step 2 to the target 
complex: 


-1 SI-I 
97 (of) = oF 
4. Apply the permutations in order to the initial target facet until the resulting facet contains 
ve. This sequence of facets is the desired facet chain in 7. 


5.5 Sensing System Design for Power Systems 


The integration of distributed energy resources in power grids and the development of smart grids 
have raised several challenges such as power grid monitoring and control. Sensing systems play a 
vital role in solving these challenges and in improving the power grid reliability. When sensors 
experience hardware or software failures, the resulting contingencies and missing data can cause 
problems with the data processing algorithms utilized in power systems [16]. We refer to the sensors 
with missing or erroneous data as lost sensors. 


* Tf the final target facet is known, use F 7 in place of Ve 
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Sensors can be the primary sources of failure. Missing or erroneous data sent to the relays or 
controls can lead to false tripping of circuit breakers or other switching devices, which may lead to 
blackouts. Sequences of fault events like lost sensors or line sags can be recorded before the blackout 
happens [15]. If these events are identified quickly and sensor failures are corrected efficiently, 
major blackouts could be avoided. 

To handle this rapid identification of events, we propose to integrate an intelligent SMS with 
the control and monitoring systems, allowing tasks or services to be accomplished as well as the 
selection of optimal sensor teams needed. We consider a service to be the monitoring and detection 
of fault events, and the correction and reconstruction of erroneous or missing data. The SMS should 
feature communication among sensors with the goal of identifying fault events and correcting errors 
in data that lead to contingencies. 

The integration of the SMS with the sensing system will increase the system resilience and 
reliability. The SMS can be developed as a software layer in the sensing system, which selects the 
teams of sensors needed for tasks, checks the health of sensors, corrects and reconstructs data, and 
infers decisions to controllers. 

In the following section, we illustrate the reasoning process with the IEEE 14 bus system 
(Figure 5.6). We have chosen this case as a simple example for illustration purposes. Although there 
are simpler techniques that can accomplish the computations shown, our contribution is that this 
structure of reasoning can be extended to much more difficult problems where those methods fail 
and flexible reasoning is needed. 


Three Winding 


transformer equivalent 
9 


© Generators A 


Synchronous 


condensers 


AEP 14 bus test system bus code diagram 


Figure 5.6 IEEE 14 bus system. 


106 m Intelligent Sensor Networks 


5.5.1 Development of Knowledge 


In this application, we develop sensing services performed by a given team 7; of sensors that are able 
to detect fault events and reconstruct data from lost sensors. The proposed KR contains information 
about the services needed for the identification of sensor failures. These services use teams of sensors 
in a given power line and its adjacent lines. Each service records data to be processed in order to 
detect and locate the failure. 


5.5.1.1 Redundancy and Agreement among Sensors 


In order to monitor the sensor network, sensors are checked using their redundancy. The verification 
of sensor values is done using KirchhofPs Current Law (KCL) or Voltage Law (KVL). For the 
examples in this chapter, we use KCL. Consider the sensors at buses 1,...,5 in the IEEE-14 bus 
system as shown in Figure 5.7. At each transmission line £;, two sensors are placed in the head 
and the tail respectively. A sensor s;; is indexed such that i corresponds to the nearest bus and j 
corresponds to the remaining adjacent bus. The current measured by sensor 5;,; is written /;,; and 
the current measured by sensor 5; ; is [;,;. The reference direction of current is defined to be toward 
the higher numbered bus, toward the load, or away from the generator. Since sensors s1 2 and 
s2,1 are on the same line, a redundancy check can be done without any computation and simply 
amounts to checking if J;; = Jj. 


Definition 5.14 (Directly and indirectly redundant teams) Assuming non-fault conditions, 
we refer to a team of sensors as a directly redundant team if the sensors are all duplicates or backups 
of each other. For current measurements, this is true only for sensors that are all on the same line. 
Applying KCL or KVL to a network yields a system of linear equations. Ifa team of sensors satisfies 
exactly one of these equations, it is called an indirectly redundant team. 


l 
l 
I 
Bus 1 Bus 2 ! 
PAN COC A f 
ji | | I | 52,3 1, 
i! $1,5 1 I I 1 i Bus 3 
il I I I (| Un 
Bus 5 E | ] 1 1 1 pl 
us ¡1 I S12 | I S21 I 3 I 
f E E ENE AE E 2,4 1 
A if 1 i 1 g i Bus 4 
y! -A | l E i 
1S TI T T 
Hf LG INI | Li I ji n 
I pl | I il $25 1, 
- E Iei aie ea Eau — Bus 5 
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Meee. Sele Ses al I I I 
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Figure 5.7 Example of team coverage and redundancy. 
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Example 5.5: Directly and indirectly redundant teams 


In Figure 5.7, the team of current sensors T4 = {51,2, 52,1} is directly redundant, while the team 
T2 = {52,1,$2,3,$2,4,$2,57$2,L,$2,G} is indirectly redundant. The second case can be seen from 
applying KCL to bus 2 in Figure 5.7. We have 


h1 + LG = Ip 3 + Ip 4 + I 5 + IL. (5.2) 


The variable /) 1 is the current measured by sensor s2, and is defined to flow from bus 1 to 
bus 2. 


Definition 5.15 (Composite redundancy) We say a team satisfies a composite redundancy if it 
is a union of directly and indirectly redundant teams. 


Example 5.6: 


In Figure 5.7, the team T3 = (51,2, 52,1, $2,3, 52,4, 52,5, 52,1, 52,G) is composite as it is a union of Ty 
and Ty. However, T4 = [51 5,51 2,52,1,52,5,52,4) is not composite. Every sensor in a composite 
team can be validated by and contribute to a redundancy check. In order to make T4 composite, 
we would either have to add 5, 3 or remove $3 4 and s2 5. 


5.5.1.2 Representation of Knowledge 


We begin by specifying the predicates of the KR and creating facets that satisfy these predicates. 
Since we are interested in the satisfaction of redundancies, we choose the two predicates*: “The 
team of sensors is directly redundant” and “The team of sensors is indirectly redundant.” We refer 
to their respective simplexes as direct and indirect. 


Example 5.7: 


Take the two sensor teams (51 2,52 1) and [51 G,51, 5,51 2) from the network in Figure 5.7. The 
two simplexes associated to these teams are shown in Figure 5.8. 


Let s;; and sj; denote the sensors placed on £; respectively at the head and tail of the line. In 
order to design the ASC of sensors in power networks, we first assume that one or more sensors 
are placed in every transmission line. This allows the power network to have full sensor coverage. 
The ASC can then be developed considering three steps: (1) At the first step, we consider only the 


$31 


(a) (b) 


Figure 5.8 (a) Simplex for a directly redundant team and (b) simplex for an indirectly redundant 
team. 


* The resulting equations such as Equation 5.2 can be seen as a modification of the facet ideal in [6]. 
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Figure 5.9 Constructing the ASC for full coverage. (a) Construction of indirect simplices and 
line identification. (b) Direct simplices are added. (c) ASC is completed by adding remaining 
indirect simplices. 


indirectly redundant teams and at each line L;; we consider a possible measurement* (Figure 5.9a); 
(2) at the second step, we add sensors in directly redundant teams as shown in Figure 5.9b; and 
finally (3) we add simplices by connecting the new vertex to the rest of the vertices (Figure 5.9c) so 
that each of the added simplices is itself an indirectly redundant team. 

The result of step (1) is shown in Figure 5.10. Step (3) produces the ASC in Figure 5.11 whose 
simplices represent algebraic substitution of the direct redundancy into the indirect redundancy. 


5.5.1.3 Strategy Chains for Redundancy Checks 


Here we use facet chains to represent the strategy of checking sensors for contingency from the 
knowledge of their redundancy. Each of the predicates in the KR represented in Figure 5.11 can 
be used for redundancy checking, providing strategies for the monitoring service to detect sensor 
failures. Consider s; as the service of the agreement check that is executed by the team T? = 52,1, 
52,5 52,4, 52,3, 2,L, $2,G Of indirectly redundant sensors. 

Given a lost sensor to recover data from, form a chain with the following steps: 


Delete vertices from the SC that do not correspond to existing sensors in the network. 

If a simplex has more than one vertex removed, delete the simplex. 

Compute the transitive closure of the link matrix of the ASC. 

Pick the row of the transitive closure matrix corresponding to a simplex that contains the lost 
sensor vertex. From this row, select the entry with the smallest chain norm. Call this entry 
the optimal chain. 


es Oe 9 


The resulting chain identifies the sensors whose data are needed for the reconstruction. 


* In the case of KCL, each simplex will correspond to a bus of the network. 
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SL 


Figure 5.10 Simplicial complex of full sensor coverage in IEEE 14-bus system. 


Figure 5.11 Extended simplicial complex of full sensor coverage in IEEE-14 bus system. 
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5.5.2 Reasoning Algorithm for Sensor Failure Detection 
and Data Restoration 


In order to detect sensor failures, the SMS will use a reasoning algorithm, which will utilize the 
sensors redundancy to identify the faulty sensors and restore their data. The agreement of sensors 
is tested by checking for satisfaction of the constraint equation. We say that a team of sensors 
agrees if their measured quantities satisfy the given constraint equation, in this case one of the 
Kirchhoffs laws. 


Example 5.8: Using Chains from Extended Complex 


Figure 5.12 shows hypothetical sensor failures in the complex from Figure 5.9. When one sensor 
fails, the chain composed of the single link {F1, F2} still provides redundancy. When another 
sensor fails, both s; j and s; p can be computed by substitution. Denote the measured quantity at 
sensor s; j by 57: 


Fi: Sib = Sia + Sij (5.3) 
Fa: sij= Sic + Sid (5.4) 
{F1; F2} : Sib = sja + Sc + Sia (5.5) 


If they all agree, the algorithm declares that all sensors in the team are healthy, while if there 
is a team who's sensors do not agree with each other, the algorithm will identify the sensors 
that do not agree and they will be declared faulty. After a sensor is declared faulty, a process 
of service restoration will begin. If a failure occurs in one sensor, the SMS will accommodate 
the fault by isolating the faulty sensor and reconstructing their data from the remaining healthy 
ones. 


5.5.2.1 Simulation Results 


To validate our approach for the contingency test, the IEEE-14 bus system is simulated in a 
MATLAB®/Simulink® environment with three different scenarios for sensor faults: for failures in 
the sensors s12 at the time interval [0.1, 0.3s], s2,; at the time interval [0.4, 0.6s], and s2 at the 
time interval [0.7, 0.9s]. 


Figure 5.12 (a) First sensor fails, but the chain still provides redundancy; (b) second sensor 
fails, the chain no longer provides redundancy, but the values of both sensors can still be 
reconstructed. 
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Consider the following teams of sensors : 
1. Composite: 71 = (51,2, 52,15 52,3, 52,45 52,5» 52,15 52,G) 
2. Indirect: Ta = (52,1, 52,3, 5,45 52,5, 52,L, 52,G) 


3. Direct: 73 = [51,2,52,1) 


To test the argument between sensors in a team 7;, we consider the following analytic relations: 


A * * * * * E EI 
AT) : sí 53,1 — $33 T 52.4 — 52,5 — 52,1 — 92,6 = 9 


A(T) : sa 52,3 54 Ss -SL SGE 
A(T) : 515 — SG == 0 


where A(7;) represent the agreement relation. The team 7; is in agreement when the relation 
A(T;) is satisfied, otherwise 7; is in disagreement. If a team of sensors disagrees, then one or more 
sensors in the team is faulty. Figure 5.13 shows the residues of the agreement test conducted in 
teams 71, T2, and 73. For the three fault scenarios, the agreement relations A(71), A(72), and 
A(T3) are not satisfied and have non-zero residues when failures occur (Figure 5.13). However, 
only team 71 is able to detect the maximum number of failures. In order to identify which sensor 
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Figure 5.13 Agreement test. 
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has failed, we compare the agreement of the composite team against the agreement of the direct 
and the indirect redundant teams: 


m At the time interval [0.1, 0.3s], team 72 is in agreement, while teams 71 and 73 are not. 
This means that the faulty sensor is an element of 7; and 73 and not an element of 72. 

m At the time interval [0.4, 0.6s], none of the teams 71, T2, and 73 are in agreement. So the 
faulty sensor must be an element of 71, T2, and 73. 

m  Atthe time interval [0.7, 0.9s], team 73 agrees, while teams 71 and T} are not in agreement. 
This means that the faulty sensor is an element of 71 and T> and not an element of 73. 


This comparison allows us to identify the sensor failures when it occurs in power grids. 


5.5.2.2 Future Application: Using Analogies in the Reasoning 
Algorithm for Sensor Team Selection 


When not quickly and accurately detected, faults in a power network may lead to a cascade of 
other failures, for example, a transmission line tripping can cause a transient, overloading lines in 
other areas. Cascading failures in power systems are the major cause of large blackouts. Even simple 
events such as line sag close to nearby trees can be the primary source of a cascading failure. 

As power systems are generally large-scale distributed systems, redundancy checking can be 
computationally expensive.* To reduce this burden, the SMS should be able to select only specific 
teams to test the redundancy when a fault event occurs, instead of checking every redundancy. 
To achieve this objective, analogies can be a good candidate solution for the team selection. This 
solution will be based on the strategies of chains that have been used when some specific fault events 
occurred. 

In a power system, due to similar components or topologies, certain fault events can be fairly 
common and repeat themselves in different areas of the grid. By using analogies to identify 
these similar events, it may be possible to get a better understanding of the spatial and temporal 
distribution of similar faults. The knowledge of this distribution can be used by the SMS to select 
only the teams that are useful to detect the given fault type. 


5.6 Conclusion 
The integration of a new SMS with the capability of learning, updating knowledge, and making 


decisions without human intervention can answer the control and monitoring needs of complex 
engineered systems. The objective of this research is to use a CAT-based KR to develop reasoning 
algorithms for a distributed intelligent SMS, which will be integrated as a software layer in the 
control and monitoring system. 

To achieve this objective, we developed a set of topological tools and a theory for KR and 
reasoning. The motivation behind developing an analogical reasoning process using this model is 
to allow an autonomous intelligent system to act according to structural comparisons among its 
beliefs about the world as opposed to just what is explicitly representable with logical rules. 

We use ASCs as a tool to model knowledge that will serve as a basis for reasoning operations. 
ASCs have been chosen in this research because of their rich structure, which allows the encoding 


* The presented method relies on matrix multiplication and so has complexity O(n). 
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of a large range of knowledge. These developed tools will have an impact beyond the field of sensing 
and measurement systems, as the generality of our approach will yield a theory that can be used 
in numerous branches of control, such as system health management, robotics, and unmanned 
space exploration. The development of technologies (e.g., actuators, sensors, etc.) presents an 
opportunity to develop a new intelligent control paradigm based on reasoning by analogy with the 
goal of improving the capability of systems. Furthermore, some of the biggest obstacles in large-scale 
complex systems are (1) uncertainty, (2) controllability, and (3) unpredictability because of the 
system’s dynamic nature. The proposed tools for using analogies for problem solving in real-time 
will go a long way in addressing these concerns. 
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6.1 Introduction 


Wireless sensor networks consist of a large number of small sensor devices that have the capability 
to take various measurements of their environment. These measurements include seismic, acoustic, 
magnetic, IR and video information. These devices are highly resource-constrained, equipped with 
a small processor and wireless communication antenna, and battery powered. To be used, sensors 
are scattered around a sensing field to collect information about their surroundings. For example, 
sensors can be used in a battlefield to gather information about enemy troops, detect events such 
as explosions, and track and localize targets. Upon deployment in a field, they form a wireless ad 
hoc network and communicate with each other and with data processing centers. 

A sensor network is typically expected to perform multiple tasks or missions. A mission, in this 
context, is any job that requires some amount of sensing resources to be accomplished such as video 
monitoring a field, tracking a target, or localizing an event. Missions can be divided into multiple 
sub-missions. For example, monitoring a large field can be divided into monitoring multiple smaller 
areas. Each mission can be modeled with a demand, which measures its need for sensing resources, 
and a profit, which represents its importance. In heterogeneous networks, the requirement of a 
mission is specified by its need for different types of sensors. For example, a mission may require 
video imaging and seismic data to identify an object. In such cases, sensors must be bundled 
together then assigned to missions. The utility (or amount/quality of information) that a sensor can 
provide to a mission depends on several factors. These include the type of the sensor, its sensing 
range, its geographic location relative to the mission, and its current operational status, such as its 
remaining energy. 

Due to the limited number of sensors and the potentially large number of missions, competition 
will arise. In such cases, it might not be possible to satisfy the requirements of all missions using 
available sensors. Given all currently available information, the network should intelligently choose 
the “best” assignment of the available sensors to the missions to maximize the utility of the network. 
In this chapter, we discuss this sensor-mission assignment problem. 

Although certain types of sensors, such as seismic or acoustic sensors, can receive data from 
their local surroundings as a whole, other types of sensors need to be directed to a certain location 
such as those used for imaging purposes. In these cases, the direction of each sensor, and thus the 
mission it serves, must be chosen appropriately since it may only benefit one mission. The focus 
here will be on directional sensors as they typically pose more challenging problems. Almost all the 
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problems considered here are NP-hard, and hence we discuss some heuristics that were proposed 
in the literature to solve such problems. 


6.1.1 Problem Variants 


There are two broad settings in which sensors need to be assigned to missions: static and dynamic. 
In the static setting, all the missions need to be satisfied simultaneously. Hence, the information 
about all missions, including their profits and demands, is available at once when making the 
assignment decisions. This setting is useful when the system has a set of long-lived missions such 
as perimeter monitoring applications. It can also be useful in systems that are not expected to have 
fast response, in which case missions arriving at different times may be batched together and start 
at a single point in time. In the dynamic setting, missions start at different times and have different 
durations. This is a more practical setting in which a sensor network is expected to operate. In such 
a setting, the system is expected to have a prompt response to incoming missions and should be 
able to adapt quickly to changes. 

There are different constraints for the sensor-mission assignment problem that will be considered. 
In some environments, missions may have budget constraints to limit the number of sensors they 
can use; in this case, sensors will have associated cost. In other environments, /ifetime of sensors 
constrain the amount of utility they can provide. 

Solutions we discuss in this chapter can be divided into two main categories: centralized and 
distributed. In a centralized solution, all assignment decisions are made by a single node (typically a 
base station that is deployed in the field) that has all the information about missions. To do this the 
base station collects all the required information about each sensor. Then, it runs a local algorithm 
to decide on the assignments and sends these assignments to the respective sensors. Due to its global 
view of the field, this approach can provide high quality solutions, but can be expensive in terms of 
communication overhead and introduces a single point of failure. In a distributed solution, sensors 
make these decisions on their own. In such an approach, mission information is disseminated to 
the network and individual nodes decide on the assignments. This is a more fault-tolerant approach 
and can be more efficient in terms of communication overhead, but might not provide as good a 
solution as the centralized approach. 


6.1.2 Related Work 


There has been some work in defining frameworks for sensor-mission assignment problems. For 
example, [3] defines a framework for the assignment problem in which the goal is to maximize the 
utility while staying under a predefined budget. However, the authors do not consider the case of 
competing missions. The general problem of sensor selection to achieve an objective has also received 
sizable attention lately. For example, in [10,13] the authors solve the coverage problem, which is a 
related problem, using the least number of sensors to conserve energy. Another related problem is 
to efficiently locate and track targets such as in [7,8,16]. A survey of the different sensor selection 
and assignment algorithms including theoretical models of the problem can be found in [11]. 

In this chapter, we discuss several sensor-mission assignment problems in the presence of 
different constraints. Both the static and dynamic settings are considered. We discuss different 
intelligent solutions to solve these problems based on centralized and distributed approaches. The 
main goal of these solutions is to intelligently assign sensing resources to the competing missions. 
Mainly, we focus on the model and solutions proposed in [12] and [6]. 
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6.1.3 Network Model 


In the network model we consider in this chapter, it is assumed that a set of static sensors are 
predeployed in a field. Missions can arrive and depart over time. A mission is a primitive sensing 
task that requires information of a certain type, which may be contributed by one or more 
sensors. Each mission is defined by a specific geographic location. An example of a mission is 
video monitoring an area of interest. General missions that cover large areas, such as perimeter 
monitoring, can be divided into multiple missions each having its own location. The deployed 
sensors are directional in nature and hence each of them can be assigned to a single mission (i.e., 
directed to one location). The direction of a sensor can be changed when the assignment is changed. 
A video camera is a good example of such sensors. 


6.1.4 Roadmap 


The rest of this chapter is organized as follows: Section 6.2 introduces the sensor-mission assignment 
problem in the presence of competition and discusses different solutions. Section 6.3 extends the 
problem by considering extra budget constraint in the static setting while Section 6.4 considers 
lifetime constraints in dynamic environments. Finally, Section 6.5 concludes this chapter by looking 
at future research directions in this area. 


6.2 Basic Sensor-Mission Assignment 


As mentioned previously, missions may vary in both importance (profit) and difficulty (demand), 
and these properties need not be correlated. An ongoing surveillance mission, for example, may be 
expensive but of minor importance, whereas an urgent mission for information about one particular 
spot may be of low-demand but very important. In many applications, partial satisfaction will be 
no better than zero satisfaction. If the goal of a given mission is to reconstruct the 3D shape of an 
object, for example, then this may be accomplished with images from two cameras, but an image 
from just one camera will be useless. Indeed, accepting the single image could actually be harmful 
since the drain on the sensor's battery could preclude a future mission that might otherwise have 
been satisfiable. The model in [12] considers two profit functions: (1) only receives profit from 
missions whose demands are fully met and (2) considers profits from ones that reached a preset 
threshold. Hence the problem is to choose the “best” assignment of sensors to missions, in the sense 
that profits from satisfied missions are maximized. 

In some networks, there may simply be a static set of long-term missions, in which case the aspect 
of time may be eliminated. In other settings, mission arrivals and departures may be infrequent, so 
that for each block of time, sensor assignment can be solved as a static problem. Even in this static 
setting, this problem is computationally hard to solve optimally. Thus approximation algorithms 
and heuristics are used. 

A centralized approach to sensor assignment will collect all the relevant information at a 
central location for decision-making and then distribute assignments. Such an approach can 
be expensive in terms of communication overhead, however. Another approach is to have nodes 
make these assignment decisions locally, in a distributed manner, using mission information that 
is disseminated into the network. While this should decrease communication costs, a centralized 
algorithm may be able to guarantee a better solution. 

Since the problem (in its most general form) is NP-hard, we consider constrained version for 
which approximation algorithms exist. We consider a geometric constraint; only sensors within 


Intelligent Sensor-to-Mission Assignment m 119 


a bounded sensing range from a mission can be assigned to that mission, which is a reasonable 
assumption in realistic settings. The problem is generalized further by allowing a mission to be 
successful even if its demand is not fully met. This is done by setting a threshold that specifies the 
minimum fraction of the demand to be met for a mission to succeed. In this case, the mission will 
not be awarded the full profit, but rather a fraction based on its satisfaction level. 

In this section, we consider the problem of assigning directional sensors to missions in wireless 
sensor networks. We study both centralized and distributed approaches to solving the dynamic 
problem. As energy is a critical resource in wireless sensor networks, we discuss an energy-aware 
extension to the distributed algorithm that extends network lifetime. This section is based on [12]. 


6.2.1 Problem Definition 


Here, we present the formal definition of the core sensor-mission assignment problem and extend 
it to the dynamic setting. 


6.2.1.1 Static Problem 


The core problem, which is called Semi-Matching with Demand (SMD) [1], is modeled as a weighted 
bipartite graph whose vertex sets consist of sensors S = (S],...,S,) and missions (Mi, . . . , Mm} 
(see Figure 6.1). A sensor S; may be able, depending on its type and location, to provide mission 
M, with some data. A positively weighted edge (S;, Mj) means that $; is applicable to M;. The 
weight of the edge (S;, Mj) is denoted by ej and indicates the utility (or quality of information) that 
S; could contribute to M; if this assignment were chosen. The utility may vary depending on the 
sensor's type, location, or other properties. Also given is a positive-valued demand dj associated with 


Sensors Missions 


| 


Figure 6.1 Modeling the problem as a bipartite graph. 
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each mission M; indicating the total utility the mission requires. The problem can be simplified 
by assuming that the utility amounts received by a mission are additive. That is, the total utility 
received by a mission is equal to the sum of the utilities provided by sensors assigned to it. While this 
may be realistic in some settings such as sensing applications in which high-quality measurements 
can be obtained by, for example, either taking a single high-quality reading or averaging together 
several lower-quality readings, in others it is not; for the purpose of the discussion this assumption 
is sufficient. 

A solution to the problem would seek a semi-matching of sensors to missions, so that, ideally, 
each mission demand is satisfied. That is a sensor may be assigned to at most one of the missions to 
which it is applicable, but a mission can accept utility from multiple sensors. Of course, satisfying 
all missions may not be feasible; in general, the goal is to maximize a weighted sum of the satisfied 
missions. Since there is a profit p; associated with achieving mission Mj, the goal becomes to 
maximize the total satisfied profits. 

The problem can be generalized by introducing the concept of a threshold. In this case, missions 
can be partially successful if they reached a minimum threshold of utility. The demand may now be 
interpreted as the total utility the mission desires. Profit for mission M; indicates the importance of 
the mission and is awarded based on the percentage of satisfied demand, but only if this percentage 
reaches a satisfaction threshold 7; p; is the maximum profit receivable for mission M;. The goal is 
to maximize total profits. 

The problem instance and goal are defined as follows: 

Instance: A global threshold T € [0,1] and a weighted bipartite graph G = (S, M, P, D, E), 
where S = {S1,..., Sn} is a collection of sensors and M = {M,...,Myp} is a collection of 
missions; each mission M; is associated with a profit {p;} and a demand (4;); each edge in S x M 
has an edge-weight e; indicating utility. 

Goal: Find a semi-matching F € S x M (no two chosen edges share the same sensor), in which 
5 j Pj (j) is maximized, where 4; is the total utility received by mission M; divided by demand qj. 
The profit functions are defined as follows: 


0, if uj <T 


The problem can be formulated as an integer program (IP). The following IP employs the 
decision variable xj indicating whether sensor S; is assigned to mission M;. Finding a solution can 
be seen as a two-step process: decide which missions to satisfy and then decide how to satisfy them. 
Each mission M; has a constraint requiring that the sum of utility received by M; be at least the 
value 7, which is a user-defined variable. Here is the IP: 


Maximize: > Pi) 
Such tha: Py xijeij = T, for each mission Mj, 
Dj xij < 1, for each sensor S;, and 
xij € (0, 1), for each variable xj and u; € [0, 1], for each variable u; 


6.2.1.2 Dynamic Problem 


An orthogonal generalization of the original problem, in terms of time, is used to model more 
realistic scenarios in which missions arrive and depart over time. The problem statement is the 
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same, except that now each mission is associated with a start time and an end time. A mission’s 
demand and maximum profit are constant over time. Awarded profit for a mission is computed 
at each discrete timestep, based on the satisfaction level at that instant. Total profit for a mission 
is simply the sum of the instantaneous profits. It is not required that a mission's demand be 
met over its entire lifetime in order to receive profit. The profit model is in this sense fractional 
in terms of time. The dynamic version is thus given essentially by the same in Section 6.2.1.1 
mathematical program (MP) given in Section 6.2.1.1 except that each variable now has an additional 
time index. 

As a generalization of the static problem, previous hardness results apply also to the dynamic 
version. Indeed, a natural strategy for the dynamic problem is to solve the static problem at each 
timestep. 


6.2.2 Centralized Algorithm 


A greedy algorithm to solve this problem will repeatedly satisfy the most currently profitable mission, 
that is, the mission that can be satisfied with the greatest profit, using the currently available sensors. 
IfS C Sis the set of not-yet-assigned sensors (initially S = S) and 4; = > ss ej, then the profit 
currently achievable by mission M; is p;(u;). Of course, it may be that not all sensors are needed to 
achieve this profit; conversely, if the demand threshold is not met, this profit is 0. The algorithm 
repeatedly select a mission M; of maximum current profitability, and then satisfies it with available 
sensors (which are removed from S ), in order of decreasing contribution value e;;, until either M; is 
fully satisfied or all sensors with nonzero offers to M; have been used. When there are no remaining 
missions with nonzero current profitability, the algorithm completes. The running time of the 
algorithm as written is O(n(m + log n)) , but it is easy to improve this to O(mn log n) by updating 
the u; values over time rather than computing them from scratch. The details of the algorithm are 
as follows: 


6.2.3 Distributed Algorithms 


Although centralized algorithms such as the greedy algorithm discussed in the previous section 
may provide better solutions to the sensor-mission assignment problem due to their global view of 


Algorithm 6.1 Greedy algorithm 


INPUT: S, M, ej, V(S;,Mj) ES x M and pj,d¥M; € M 
while true 


for each available mission M; 
uj <= > Sunse ei; 

j — arg max;p;(u;) 

if p;(u;) = 0 then break 

for each unused S; in decreasing order of es; 
if uj; > dj or e = 0 then break 
assign S; to M; 

OUTPUT: sensor assignment 
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the field, they can be expensive in terms of communication cost. Because a centralized algorithm 
requires global information about all sensors in the network such as their locations, utilities to the 
different missions and other stats, the number of messages required to be sent to the base station can 
become very large, especially in dense networks. This communication cost becomes even higher for 
dynamic environments in which missions arrive and depart at different points in time, requiring 
the base station to continually gather information about sensors in the field. 

To avoid this cost, distributed algorithms are developed to solve the problem. In such an 
approach, a mission leader is selected for each mission. This should be a sensor that is close to the 
mission’s location. Finding the leader can be done using geographic-based routing techniques such 
as [2] or [9]. The leaders are informed about the missions’ demands, profits, and locations by the 
base station. Then they run a local protocol to match nearby sensors to their respective missions. It 
is assumed that the contribution a sensor can provide to a mission is a function of the geographic 
distance between them and hence only nearby sensors are considered. In the following we discuss 
a multi-round proposal algorithm (MRPA) that works on static settings, which is then adapt to 
dynamic cases. We also discuss an energy-aware extension to the dynamic algorithm that helps in 
prolonging the network lifetime [12]. 


6.2.3.1 Bidding Algorithm 


In this algorithm, each mission leader advertises its mission information (demand, profit, and 
location) to the nearby sensors by means of broadcast. If the advertisement message needs to be 
sent over multiple hops then neighboring sensors rebroadcast the message so their neighbors can 
hear it. The number of hops over which the advertisement message is sent depends on the relation 
between the communication range, which is the maximum distance over which two sensors can 
communicate, and the sensing range. If the sensing range is larger than the communication range 
then sensors that are further away should be notified. 

When nearby nodes receive mission advertisements from one or more missions, they decide on 
which mission(s) to bid. To achieve the highest possible profit, the bidding price (Bj) that sensor 
S; sends to mission M; is set to the product of the sensor-mission contribution and the mission's 
profit. Using the notation of Section 6.2.1, By = e;/d; x pj. The sensor sorts the bidding prices 
in decreasing order and sends bids to the first N (decided by a protocol parameter). 

Mission leaders wait for some time to receive all bids; then they select the best sensors for their 
needs and send them assignment messages. A sensor is assigned to the first mission it receives an 
assignment message from. In this algorithm, missions compete for sensors. Once a mission leader 
selects a sensor, other mission leaders competing for that sensor are notified that it is no longer 
available. This last requirement makes this algorithm impractical in real systems. 


6.2.3.2 Multi-Round Proposal Algorithm 


In this algorithm, each mission leader advertises its mission information (demand, profit, and 
location) to the nearby sensors similar to the previous two algorithms. When a nearby sensor hears 
such an advertisement message for one or more missions (the set of advertising missions is denoted 
with Q), it sends a single proposal to the mission it perceives to be its best match. The ranking 
of missions is based on the profit of a mission weighted by the fraction of the mission’s demand 
that the sensor can satisfy. Using the notation introduced in Section 6.2.1, sensor S; ranks mission 
M; according to By, where By = ej /d; x pj. The leader, on the other hand, selects for the set 
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of proposing sensors G in a greedy fashion according to their contribution to the mission. If the 
leader of a mission does not select a proposing sensor, then in the next round the sensor proposes 
to the next mission on its list. This algorithm consists of a series of proposal-reply rounds. The 
more rounds, the better the assignment may be. However, as the number of rounds increases, 
the communication cost grows and only diminishing returns can be obtained. Hence, there is a 
trade-off between solution quality and communication cost. 

Since the aim is to achieve the highest profit from successful missions, a mechanism to prevent 
missions that will never be fully successful from holding up sensors that can help other missions 
is used. In each round, mission leaders assess the satisfaction level of their missions. If the level 
is not greater than an increasing threshold (a(£) for round é) then the mission is assumed to be 
unattainable and all its sensors are released. The threshold is initialized to a fixed value (e.g., 10% 
satisfaction) and incremented each round (e.g., by 10%). After a sufficient number of rounds, it 
will reach 7, the preset value of the success threshold, at which time all missions that are not 
yet successful release their sensors. The rising threshold therefore yields two benefits: increasing 
the chance that the most satisfied missions will become fully satisfied and preventing sensors from 
spending their energy on missions that will not reach the minimum success threshold (for which 
no profit is received). Algorithm 6.2 summarizes the steps taken by both the mission leaders and 
surrounding sensors. 


Algorithm 6.2 Multi-round proposal 


For leader of mission M;: 
INPUT: Set of proposing sensors G, e WS; € G, dj, a(.) and number of rounds 
for each round Å 
send advertisement 
wait for proposing sensors 
sort proposing sensors in decreasing order of e;; 
while uj; < dj 
assign the next S; (in sorted order) to M; 
Uj = Uj + ei 
if Ms satisfaction level < a(£) then 
release all sensors assigned to M; 
OUTPUT: sensor assignment 


For sensor S;: 
INPUT: Set of advertising missions Q, ej, pj, djYM; € Q and number of rounds 
for each round é 
if S; is unassigned then 
receive advertisement from all nearby missions 
ignore any mission already considered 
j — arg max ;B; = (ej / di) X pj 
send proposal to M; 
else break 
OUTPUT: mission M; to which sensor S; assigned 
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We now discuss both the runtime complexity and message complexity of Algorithm 6.2, 
assuming that the number of sensors is n, number of missions is m, and number of rounds is &. 
The running time for sensor $; is O(km) as in each round a sensor may consider up to m missions. 
The mission leader’s complexity is O(z log n) as in each round the mission leader has to sort up 
to n proposing sensors and select the best ones, but over all the rounds each sensor proposes to a 
mission at most one time. Assuming that advertisement messages are only broadcast to immediate 
neighbors, the message complexity of the algorithm including both sides, the sensor, and the 
mission, is O(m + kn), as there are m advertisement messages, O(kn) proposals by sensors, and 
O(n) reply messages from mission leaders. 


6.2.3.3 Dynamic Proposal Algorithm 


MRPA is designed to work in the cases in which we have prior knowledge about the missions. 
The multiple rounds allow it to work well even if there are several missions that compete for the 
same sensing resources. Since it requires complete knowledge about missions it can also work in 
an environment that does not need fast response to new missions. An example would be a system 
that batches together a number of missions and runs the assignment algorithm periodically, for 
example, every few hours. However, in a fully dynamic setting, the network needs to have a fast 
response to incoming and outgoing missions. By handling missions as they arrive, it is expected 
to encounter less competition for sensing resources, and so a lighter-weight algorithm that does 
not need multiple rounds to complete can be used. This algorithm is called the Dynamic Proposal 
Algorithm or DPA [12]. 

As in MRPA, each mission has a leader that advertises mission information to nearby sensors. 
A sensor that hears this announcement can be in one of two states: (1) not assigned, in which case it 
proposes to the mission with its utility or (2) assigned to a mission, in which case the sensor calculates 
its effective profit for both missions (which is the Bj value found above) and chooses either to stay 
with the current mission or to propose to the new mission, depending on which value is higher. 
So, this algorithm allows a mission to preempt an ongoing mission to increase the overall profit of 
the network. 

After the mission leader collects the proposals, it tries first to satisfy the mission demands with 
sensors in the not assigned state by greedily picking sensors with highest utility. If these sensors are not 
sufficient, it tries to steal sensors from other ongoing missions, that is, it chooses from sensors in the 
assigned state. If the collected utility at that time is at least 7, then the mission leader sends assign- 
ment messages to the respective sensors which start collecting information to support the mission. 
If a sensor is selected which preempts an existing mission, the following procedure is followed. 

Let us say that a new mission M; with leader L; started in an area close to an ongoing mission 
My with leader Ly. If a sensor S; that is currently assigned to Mz decides that its contribution will 
generate better profit if it is assigned to Mj, it notifies L¿ of its intention. Lp then tries to find one or 
more sensors to replace S;. If no such sensor(s) are found, the leader will agree on the reassignment 
as long as its current satisfaction level does not drop below 7, which will cause the mission to 
fail. If the release of S; will bring the allocated utility to lower than 7, then the reassignment is 
temporarily denied. If sensor S; is critical to the new mission M,, that is, without it the mission 
will fail, a second test is performed. If the current profit value of M; with S; assigned to it is greater 
than that of Mz, the leader of My will release its hold on S; and agree on the reassignment even if it 
will cause its own mission to fail. The reassignment becomes final once $; is selected by Mj. Only 
at that time are the replacement sensor(s) activated. Algorithm 6.3 summarizes the steps taken by 
a mission leader and sensor $;s response to the reception of an advertisement message. 
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Algorithm 6.3 Dynamic proposal 


For leader of mission M;: 
INPUT: Set of proposing sensors G, e WS, € G and d; 
send advertisement of mission M; 
sort proposing sensors in decreasing order of e; 
while 1; < dj 
assign the next S; (in sorted order) to M; 
Uj = Uj + eij 
OUTPUT: sensor assignment 


For sensor S;: 
INPUT: Set of advertising missions Q and ej, pj, djYM; € Q 
receive advertisement of mission M, 
if S; is not active then 
propose to M; with offer es; 
else if S; is assigned to My with Za X pp < Es x pj then 
ask current leader L, for reassignment 
propose to M; with offer e;; only if current leader agrees 
if selected for mission M; then 
notify leader of My to assign replacement(s) 
else 
continue operation on My 
OUTPUT: mission M; to which sensor S; assigned 


To reduce both the interruption of ongoing missions and the communication overhead, pre- 
emption is limited to one level. That is, if mission M; preempted mission Mg, Mg will try to satisfy 
its demand with only available sensors and will not try to steal sensors that are already assigned. 
When a mission ends, the leader sends out a message to announce that the mission has ended and 
all assigned sensors are released. 

Because the system is dynamic, missions that are not fully satisfied after the first assignment 
process will retry to obtain more sensors after some time. However, they only retry if there will be 
more available sensors. This can happen in the case when a nearby mission terminates and has its 
sensors released. This information can be learned either from the base station or by overhearing the 
message announcing the end of a mission. 

We now discuss the runtime complexity and message complexity of Algorithm 6.3. Assume that 
the number of sensors is 7 and the number of missions is m, then the running time for sensor $; is 
O(m) as it may consider up to m missions. The reassignment takes constant time for the sensor. The 
mission leader’s complexity, on the other hand, is O(z log n) as the mission leader has to sort up to 
n proposing sensors and select the best ones. Again, assuming that mission advertisement messages 
are only broadcast to immediate neighbors, the message complexity of the algorithm including 
both sides, the sensor and the mission, is O(m + n) as there are m advertisement messages, O(n) 
proposals by sensors, and O(n) replies from mission leaders. As the algorithm does not need several 
rounds to complete, a saving of factor & in the number of messages sent by the sensors is realized 
over MRPA. 
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6.2.3.4 Energy-Aware Dynamic Proposal Algorithm 


A drawback of DPA is that it does not consider the remaining energy in sensors when making 
assignment decisions. It selects a sensor based only on the utility it provides. However, the energy 
level of such a sensor may have been depleted over time and using it for sensing will consume its 
remaining energy leading to its death. This can happen while other sensors around it that provide 
lower utility may still have full energy. With this observation, DPA can be extended to make it 
energy-aware [called energy-aware dynamic proposal algorithm (EDPA)]. EDPA uses information 
about the proposing sensors current remaining energy level to make better assignment decisions 
that would ultimately lead to a longer lifetime. 

Instead of using the utility of a sensor to the mission alone to make assignment decision, EDPA 
uses a function (f) of utility (U) and fraction of remaining energy (E). We define 


f(U,E) = U x EP 


where f is a design parameter. If f is zero, EDPA becomes DPA and hence only the utility is 
considered. A higher value of f gives more preference to sensors with more remaining energy. 

To consume energy more evenly among sensors, after the initial assignment, possible sensor 
candidates for a mission send periodic updates to the leader including their current energy levels. 
The leader then checks if it has consumed energy unevenly among all sensors that can contribute 
to the mission. At that time, the leader may choose to change the assignments of sensors by 
reapplying the decision function f. This rotation of active sensors is similar to the technique used 
in LEACH to rotate cluster heads [5]. 

The periodic updates increase the communication overhead, but as is shown in the following 
section, this increase is not very high. As expected, this algorithm works better in a dense network 
in which there are many sensors that apply to a mission and hence more choices are available to 
the leader. 


6.2.4 Performance Evaluation 


The algorithms were evaluated using simulator built using Java and tested them using randomly 
generated problem instances [12]. Two sets of experiments were performed for the static and 
dynamic settings. For both cases the performance of the centralized greedy algorithm and the 
distributed algorithms were tested and compared. In the static setting, it is assumed that the entire 
problem instance, including all sensors and all missions, is given simultaneously. In the dynamic 
setting, missions arrive over time and depart after spending a certain amount of time being active. 
For the dynamic case, EDPA is shown to improve network lifetime by using information about 
remaining energy of sensors to make better selection decisions. 


6.2.4.1 Assumptions 


Each mission has a demand, an abstract value of the amount of sensing resources it requires, which 
is exponentially distributed with an average of 2 and a maximum of 6. Also associated with each 
mission is a profit value, which measures its importance. The profit is also exponentially distributed, 
but with an average of 1. This simulates common scenarios in which many missions demand few 
sensing resources and a smaller number demand more resources. The same applies to profit. The 
profit obtained from a successful mission M; is equal to p;(u;) as defined in Section 6.2.1. A mission 
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is considered successful if it receives at least 50% of its demanded utility from allocated sensors 
(ie., T = 0.5). Each sensor can only be assigned to a single mission. 

The utility that sensor S; provides to mission Mj is defined as a function of the distance 
Dy between them. Many types of sensors exhibit some kind of quality deterioration or signal 
attenuation based on distance. In order to evaluate their utilities to missions, it is assumed that all 
sensors know their geographical locations. Formally, the potential utility contribution is 


1 . 
—— ifD;<R, 

ej = 14+D;/0, á (6.1) 
0, otherwise 


where R, = 30 m is the global sensing range. This models typical signal attenuation, which is an 
inverse function of the distance squared. Note that this utility function is only used for testing and 
is not meant to model the exact behavior of any sensor. In the following experiments c = 60. 

Sensors are deployed in uniformly random locations in a 400 m x 400 m field (the base station 
is located in the center of the left edge). Missions also are created in uniformly random locations 
in the field. The communication range of sensors is set to 40 m. 

As an upper bound the profit results for the optimal fractional solution are included. This is 
the optimal solution for the relaxed fractional problem in which sensors may divide their utility 
between multiple missions and all fractional profits are counted, regardless of whether the success 
threshold is reached or not. The algorithms’ performance may be judged in comparison to this 
upper bound. 


6.2.4.2 Static Scenarios 


Setup: In this experiment all missions occur simultaneously, with the same start and end times. 
The number of sensors in the field is fixed to 500 and the number of missions varies from 10 to 
100. The results show the average of 10 runs. 

For the centralized approaches, results for the greedy algorithm are shown. For the distributed 
approach, results for MRPA are shown with one round, three rounds, and six rounds. These 
results illustrate the trade-off between solution quality and communication overhead. The growing 
threshold a for a mission to release sensors in MRPA is set to 10% in the first round and is increased 
by 10% for each subsequent round until it reaches 7, or 50%. Advertisement messages are sent 
from mission leaders to all sensors within two hops. 

Results: The first set of results (Figure 6.2) shows the fraction of the maximum mission profits 
achieved by the different algorithms compared to the optimal fractional profits. The maximum 
profit is the sum of all missions profits. The greedy centralized solution performs best followed by 
the six-round proposal. However, its advantage lessens as the density of the sensors increases. For 
a same-sized field with 1000 sensors deployed (not shown in the figures), all the curves except the 
one-round proposal become nearly aligned. This is expected since there are more sensors that can be 
assigned to the different missions. Note that the improvement in MRPA when going from a single 
round to three rounds is very pronounced. However, the improvement gained when jumping to 
six rounds is less apparent and may not justify the necessary communications overhead. 

Figure 6.3 shows the communication overhead of the different algorithms. As expected, the 
centralized algorithm has the highest overhead. With MRPA, as the number of rounds increase more 
messages are exchanged. The savings in the number of exchanged messages become more evident 
if a dynamic system, in which sensor-mission utility values can change over time, is considered. 
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Figure 6.3 Number of messages. 


In this case, the centralized algorithm needs to collect information about current sensor utility values 
before running an assignment process for each new mission that arrives. Distributed algorithms, 
on the other hand, require the exchange of fewer messages since information about utility values 
only needs to be sent to the leader of the new mission, which is just a few hops away. 

As the number of sensors and missions increase, the distributed algorithms encounter greater 
overlap in the areas of local assignment, which leads to more exchanged messages. This is true 
because as densities of sensors and missions increase more sensors can contribute to each mission 
and at the same time each sensor can contribute to more missions. 
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Figure 6.4 Number of assigned sensors. 


The number of sensors assigned can be used as a proxy for the amount of energy used (shown 
in Figure 6.4). The number of sensors used by the centralized algorithm is very close to that of the 
six-round proposal algorithm. For MRPA, few sensors are assigned if one round is used. This is 
because this setting does not allow sensors that were rejected in this round to repropose to other 
missions. With three rounds, the mission leaders release sensors that are not useful which allows 
these sensors to repropose to other missions and hence the number of assigned sensors becomes 
higher. With six rounds most of the unused sensors are released which brings the number of the 
assigned sensors down to about that of the centralized algorithm. Note that the algorithms do not 
use all the available sensors even when the number of missions is large. This is because some sensors 
are not within the sensing range ofany mission and hence remain idle. When considering the results 
in this figure we should take into consideration the achieved profits in Figure 6.2. For example, 
even though the centralized greedy algorithm assigns close to 250 sensors when 100 missions are 
present, it achieves less than 60% of the possible profits. 

Finally, Figure 6.5 shows the fraction of satisfied missions for the different algorithms. Note that 
the goal is not to maximize this number, but rather to achieve the highest profit. The centralized 
algorithm is successful in achieving the highest profit values, but not always the largest fraction of 
satisfied missions. The six-round proposal achieves higher fraction when the number of missions is 
large. This happens because the greedy centralized algorithm assigns sensors to missions in order of 
profit and hence may stop satisfying missions after a certain point because sensors are no longer avail- 
able. So, even though the fraction of satisfied missions is less than that of the six-round proposal, the 
amount of profits is higher as more profitable missions were picked. Because of its lack of global view, 
the six-round proposal algorithm is satisfying more missions, but with lower profits. Between the 
three multi-round algorithms tested, there is a significant increase in the fraction of satisfied missions 
between one round and three rounds and less improvement between three rounds and six rounds. 

From the results obtained, we can see that the distributed algorithms perform well. The difference 
in achieved profit values compared to the centralized algorithm is less than 8%. At the same time, 
it saves as much as 50% of the transmitted messages. 
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Figure 6.6 shows the relation between achieved profits and number of rounds. The number of 
sensors is fixed to 500 and missions to 50 or 100 missions (depending on the experiment). As can 
be expected, achieved profits initially increase with the number of rounds. However, the additional 
gains beyond eight rounds are small since by that time all attainable missions have reached their 
success threshold. Figure 6.7 shows the communication overhead that increases linearly with the 
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Figure 6.7 Number of messages vs. rounds. 


6.2.4.3 Dynamic Scenarios 


Setup: Now we discuss the performance of the distributed proposal algorithm in a dynamic setting 
in which missions arrive over time. The same aforementioned assumptions apply here. Moreover, 
it is assumed that missions arrive according to a Poisson distribution. The mission lifetimes are 
selected according to an exponential distribution with an average lifetime of 1 h and a maximum of 
4h. The exponential distribution is heavy-tailed that models realistic scenarios in which there are 
many short-lived missions and few long-lived ones. Although a centralized algorithm is impractical 
in dynamic scenarios, due to high communication cost, its results are included to measure the 
performance of the distributed algorithm. The centralized algorithm, in this case, is rerun for each 
mission arrival and departure. 

Results: The performance of DPA is compared to results achieved by the greedy centralized 
algorithm and the optimal fractional solution. Figures 6.8 and 6.9 show a trace of the achieved 
network profits during a period of 12 h for arrival rate, A=3 missions/h and 6 missions/h 
respectively, with a network of 500 sensors. The simulation is started at time zero and the trace is 
collected after 10 h to allow the network to reach steady state. As can be seen from the figures, the 
performance of DPA, which utilizes local information about missions, is very close to that of the 
centralized algorithm. 

Figures 6.10 and 6.11 show the average performance over a period of 50 h (averaged over 
10 runs) for a network with 500 sensors and 1000 sensors. Figure 6.10 shows the average achieved 
profits per unit of time (fraction of maximum) as the mission arrival rate is varied. We see that both 
the centralized and DPA perform almost equally. Note that, as expected, with a larger number of 
sensors the network can achieve higher profits. 

The communication overhead of DPA is shown in Figure 6.11. The number of messages grows 
linearly as the number of missions in the network increases. The number of messages for the 
centralized algorithm is not shown as this value will be very large compared to DPA. For each 
mission arrival and departure, the base station needs to collect information from all sensors that can 
contribute to the arriving mission to get status updates. The average number of messages exchanged 
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Figure 6.9 Trace of network performance (A = 6 missions/h). 


per mission is around 40 for 500 sensors and 80 for 1000 sensors. This includes all the messages 
needed to advertise the mission and makes all the assignment decisions including reassignments. 

Network lifetime: Figures 6.12 through 6.14 show the results for EDPA in a network of 
500 sensors and 1000 sensors (average of 10 runs). The mission arrival rate is set to 6 missions/h. 
All sensors start with energy to support 10 h of continuous sensing. Only energy consumed 
for sensing is considered. Sensor reassignment is performed every 20 min to balance energy 
consumption. Choosing a smaller period may yield a more uniform assignment, but will have a 
larger communication overhead. 

The network lifetime is defined here as the time until the first sensor dies. Figure 6.12 shows the 
lifetime of the network for different values of (3, a parameter used to control dependence on remain- 
ing energy. Recall that when B = 0, EDPA becomes DPA. The results show that when EDPA is 
used, network lifetime increases by 50% and 70%, for networks with 500 sensors and 1000 sensors, 
respectively. The increase is notable when f goes from 0 to 1, that is when energy is taken into 
account. After that point, the increase in lifetime is not very pronounced. The denser the network, 
the more options the assignment algorithm has and hence it is able to achieve longer lifetime. 

Figure 6.13 shows the total achieved profits (sum of profit in every second during lifetime) for 
the different values of B. This can be thought of as the area under the curves for the two algorithms 
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shown in Figures 6.8 and 6.9. As expected, the profits increase when lifetime increases. But when 
there are more sensors, the profit increase in more prominent due to the longer lifetime and the 
fact that more sensors allow for more satisfaction for missions. 

Due to periodic updates, the communication overhead increases when EDPA is used (B > 0). 
Figure 6.14 shows the average number of messages per each attempted mission. Note that there is 
an average increase of around 50% for both network sizes. Although the percentage increase may 
seem high, the actual number of exchanged messages per mission (around 60 for 500 sensors and 
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120 for 1000 sensors) is relatively small, especially considering that this number includes all the 
messages exchanged to setup a mission and is amortized over its lifetime. 


6.3 Sensor Assignment in Budget-Constrained Environments 


In this section and the next, we examine other variants of sensor-assignment problems motivated 
by conservation of resources [6]. Again, we consider two broad classes of environments: static and 
dynamic (see Section 6.4). The static setting is motivated by situations in which different users are 
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Figure 6.14 Messages per mission using EDPA. 


granted control over the sensor network at different times. During each time period, the current 
user may have many simultaneous missions. While the current user will want to satisfy as many 
of these as possible, sensing resources may be limited and expensive, both in terms of equipment 
and operational cost. In some environments, replacing batteries may be difficult, expensive, or 
dangerous. Furthermore, a sensor operating in active mode (i.e., assigned to a mission) may be 
more visible than a dormant sensor, and so is in greater danger of tampering or damage. Therefore, 
we give each mission in the static problem a budget so that no single user may overtax the network 
and deprive future users of resources. This budget serves as a constraint in terms of the amount of 
resources that can be allocated to a mission regardless of profit. 

In this section, we discuss an efficient greedy algorithm and an MRPA whose subroutine solves 
a generalized assignment problem (GAP). The performance evaluation results show that in dense 
networks both algorithms perform well, with the GAP-based algorithm slightly outperforming the 
greedy algorithm. This section is based on [6]. 


6.3.1 Problem Definition 


With multiple sensors and multiple missions, sensors should be assigned in an intelligent way. This 
goal is shared by all the problem settings we consider. There are a number of attributes, however, 
that characterize the nature and difficulty of the problem. 

The static setting is similar to the one presented in the Section 6.2. Given is a set of sensors 
Si»... > Sy and a set of missions M;,..., Mm. Each mission is associated with a utility demand 4, 
indicating the amount of sensing resources needed, and a profit p;, indicating the importance of 
the mission. Each sensor—mission pair is associated with a utility value e; that mission j will receive 
if sensor î is assigned to it. This can be a measure of the quality of information that a sensor can 
provide to a particular mission. To simplify the problem, it is again assumed that the utility values 
ej received by a mission j are additive (similar to Section 6.2). Finally, a budgetary restriction 
is given in some form, either constraining the entire problem solution or constraining individual 
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missions as follows: each mission has a budget b;, and each potential sensor assignment has cost cj. 
All the aforementioned values are positive reals, except for costs and utility, which could be zero. 
The most general problem is defined by the following MP P: 


Maximize: Xp Pip 
Such that: Xa xijeij > diyj, for each Mj, 
Și xycj < bj, for each Mj, 


jar xy < 1, for each S; 


xi € (0, 1} Vxj and 
y; € [0,1] Yy; 


A sensor can be assigned (x; = 1) at most once. Profits are received per mission based on its 
satisfaction level yj. Note that y; corresponds to u;/d; within the range [0,1] where uj = Yo, x;je;j. 
With strict profits, a mission receives exactly profit p; iff u; > dj. With fractional profits, a mission 
receives a fraction of p; proportional to its satisfaction level y; and at most p;. More generally, similar 
to the problem introduced in Section 6.2.1, profits can be awarded fractionally, after reaching a 
fractional satisfaction threshold T: 


Pi ifuj > dj 
Pim) = Y pj: mld ¡ET < uj/dj 
0, otherwise 


When T = 1, program P is an IP; when T = 0, it is a mixed IP with the decision variables xj 
still integral. 

The edge values e; may be arbitrary non-negative values, or may have additional structure. If 
sensors and missions lie in a metric space, such as the line or plane, then edge values may be based 
in some way on the distance Dj between sensor i and mission j. In the binary sensing model, e; is 
equal to 1 if distance Dj is at most the sensing range R,, and 0 otherwise. In another geometric 
setting, ej; may vary smoothly based on distance, according to a function such as 1/(1 + Dj). 

Similarly, the cost values c;; could be arbitrary or could exhibit some structure: the cost could 
depend on the sensor involved, or could, for example, correlate directly with distance Dj to 
represent the difficulty of moving a sensor to a certain position. It could also be unit, in which case 
the budget would simply constrain the number of sensors. 

Even if profits are unit, demands are integers, edge values are 0/1, and budgets are infinite, 
then this problem is NP-hard and as hard to approximate as maximum independent set as shown 
in [12]. 


6.3.2 Algorithms 


In this section, let us describe two algorithms to solve the static-assignment problem: greedy and 
multi-round generalized assignment problem (MRGAP). The former requires global knowledge of all 
missions to run and hence is considered centralized, whereas the latter can be implemented in both 
centralized and distributed environments, a benefit in the sensor network domain. 
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6.3.2.1 Greedy 


The first algorithm we consider (Algorithm 6.4) is a greedy algorithm that repeatedly attempts the 
highest-potential-profit untried mission. Because fractional profits are awarded only beyond the 
threshold percentage 7, this need not be the mission with maximum pj. For each such mission, 
sensors are assigned to it, as long as the mission budget is not yet violated, in decreasing order 
of cost-effectiveness, that is, the ratio of edge utility for that mission and the sensor cost. The 
running time of the algorithm is O(n(m + logn)). No approximation factor is given for this 
efficiency-motivated algorithm since, even for the first mission selected, there is no guarantee 
that its feasible solution will be found. This by itself is, after all, an NP-hard 0/1 Knapsack 
problem. 


6.3.2.2 Multi-Round GAP 


The idea of the second algorithm (shown in Algorithm 6.5) is to treat the missions as knapsacks 
that together form an instance of the GAP. The strategy of this algorithm is to find a good solution 
for the problem instance when treated as GAP, and then to do postprocessing to enforce the lower 
bound constraint of the profit threshold, by removing missions whose satisfaction percentage is 
too low. Releasing these sensors may make it possible to satisfy other missions, which suggest a 
series of rounds. In effect, missions not making good progress toward satisfying their demands are 
precluded from competing for sensors in later rounds. 

Cohen et al. [4] give an approximation algorithm for GAP that uses a knapsack algorithm as 
a subroutine. If the knapsack subroutine has approximation guarantee «> 1, then the Cohen 
GAP algorithm offers an approximation guarantee of 1 + a. The standard knapsack FPTAS [15] 
is used, which yields a GAP approximation guarantee of 2+ €. The post-processing step is used 
to enforce lower bounds on profits for the individual knapsacks. This is an essential feature of the 
sensor-assignment problem that is not considered by GAP. 


Algorithm 6.4 Greedy algorithm for budget-constrained SMD 
INPUT: S, M, eij, ciW(S;, Mj) ES x M and pj, dj, VM; € M 
while true do 
for each available mission M; do 
uj s= ree, ĉij 
j <arg max;p;(u;) 
if p;(u;) = 0 then break 
Cj — 0 
for each unused S; in decreasing order of e;;/c;; do 
if uj > dj or ej = 0 then break 
if cj + Ci < b; 
assign S; to M; 
cj — cj + Cj 
OUTPUT: sensor assignment 
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Algorithm 6.5 Multi-round GAP algorithm for budget-constrained SMD 


INPUT: S, M, ey, CV (Si, My) ES x M, pj, d;,bNM; € M, and T 
while true do 
initialize set of remaining missions M — {M,...M,,} 
for + = 0 to T step 0.05 do 
run the GAP subroutine on M and the unassigned sensors 
in the resulting solution, release any superfluous sensors 
if W[;s satisfaction level is < T, for any j then 
release all sensors assigned to M, 
M < M — {M;} 
if M; is completely satisfied OR has no remaining budget, for any j 
M — M — {M} 
OUTPUT: sensor assignment 


The algorithm works as follows: The threshold is initialized to a small value, for example, 5%. 
In each round, we run the GAP algorithm of [4] as a subroutine and find the solution based on 
the remaining sensors and missions. After each round, missions not meeting the threshold are 
removed, and their sensors are released. Any sensors assigned to a mission that has greater than 
100% satisfaction, and which can be released without reducing the percentage below 100%, are 
released. (Such sensors are superfluous.) Sensors assigned to missions meeting the threshold remain 
assigned to those missions. These sensors will not be considered in the next round, in which the 
new demands and budgets of each mission will become the remaining demand and the remaining 
budget of each one of them. Finally, the threshold is incremented, with rounds continuing until 
all sensors are used or all missions have succeeded or been removed. 

The GAP instance solved at each round is defined by the following linear program: 


Maximize: J; d; Piixij (with py = pjey/dj) 

Such that: mad xyey E bs, for each remaining M,, 
Zi veti xij < 1, for each unused S;, and 
xy € (0,1) Vx; 


Here d; is the remaining demand of M;, that is, the demand minus utility received from sensors 
assigned to it during previous rounds. Similarly, b; is the remaining budget of M;. The concepts 


of demand and profit are encoded in the gap model as pj = pj - ejj/ dj. This parameter represents 
the fraction of demand satisfied by the sensor, scaled by the priority of the mission. In each GAP 
computation, an assignment of sensors that maximizes the total benefit brought to the demands of 
the remaining mission is sought. 

One advantage of MRGAP is that it can be implemented in a distributed fashion. For each 
mission there can be a sensor, close to the location of the mission, that is responsible for running 
the assignment algorithm. Missions that do not contend for the same sensors can run the knapsack 
algorithm simultaneously. If two or more missions contend for the same sensors, that is, they are 
within distance 2R, of one other, then synchronization of rounds is required to prevent them from 
running the knapsack algorithm at the same time. To do this, one of the missions (e.g., the one 
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with the lowest id) can be responsible for broadcasting a synchronization message at the beginning 
of each new round. However, since R, is typically small compared to the size of the field, it can be 
expected that many missions will be able to do their computations simultaneously. 

The total running time of the algorithm depends on the threshold 7 and the step value chosen, 
as well as on the density of the problem instance, which will determine to what degree the knapsack 
computations in each round can be parallelized. 


6.3.3 Performance Evaluation 


In this section, we discuss the performance evaluation results that were generated using a custom- 
built Java simulator tested with randomly generated problem instances [6]. 


6.3.3.1 Simulation Setup 


In this experiment, all missions occur simultaneously. It is assumed that mission demands are 
chosen from an exponential distribution with an average of 2 and a minimum of 0.5. Profits for 
the different missions are also exponentially distributed with an average of 10 and a maximum of 
100. This simulates realistic scenarios in which many missions demand few sensing resources and a 
smaller number demand more resources. The same applies to profit. The simulator filters out any 
mission that is not individually satisfiable, that is, satisfiable in the absence of all other missions. 
For a sufficiently dense network, however, it can be expected that there will be few such impossible 
missions. Nodes are deployed in uniformly random locations in a 400 m x 400 m field. Missions 
are created in uniformly random locations in the field. The communication range of all sensors 
is 40 m. 

The utility of a sensor S; to a mission M; is defined as a function of the distance Dj between 
them. In order for sensors to evaluate their utilities to missions, it is assumed that all sensors know 
their geographical locations. Formally the utility is 

1 A 
ees TFE: if Dj < R 
0, otherwise 


where R, is the global sensing range. This follows typical signal attenuation models in which signal 
strength decays inversely with distance squared. In the experiments, c = 60 and R, = 30 m. 

The number of sensors in the field is fixed and the number of missions is varied from 10 to 100. 
Each sensor has a cost, chosen uniformly at random from [0, 1], which does not depend on the 
mission it is assigned to. This can represent the sensor’s actual cost in real money or, for example, 
a value indicating the risk of discovery if the sensor is activated. Each mission has a budget drawn 
from a uniform distribution with an average of 3 in the first experiment and varying from 1 to 10 
in the second. 


6.3.3.2 Results 


The first series of results show the fraction of the maximum mission profits (i.e., the sum of all 
missions profits) achieved by the different algorithms. We see profit results for the greedy algorithm, 
MRGAP, and an upper bound on the optimal value running on two classes of sensor networks, 
sparse (250 nodes) and dense (500 nodes) (Figures 6.15 and 6.16, respectively). The upper bound 
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Figure 6.15 Fraction of maximum profit achieved (250 nodes) 
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Figure 6.16 Fraction of maximum profit achieved (500 nodes) 


on the optimal is obtained by solving the LP relaxation of program P, in which all decision variables 
are allowed to take on fractional values in the range [0, 1], and the profit is simply fractional based 
on satisfaction fraction, that is, pjyy for mission M; with no attention paid to the threshold 7. The 
actual optimal value will be lower than the fractional one. 

The MRGAP algorithm, which recall can be implemented in a distributed fashion, achieves 
higher profits in all cases than does the greedy algorithm, which is strictly centralized (because 
missions have to be ordered in terms of profit). The difference, however, is not very large. It can be 
noted that with 500 nodes the network is able to achieve higher profits which is expected. 

Figure 6.17 shows the fraction of the total budget each algorithm spent to acquire the sensing 
resources it did in a network with 250 nodes. The MRGAP algorithm achieves more profit than the 
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Figure 6.17 Fraction of spent budget (250 nodes). 
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Figure 6.18 Varying the average budget (250 nodes). 


greedy algorithm and spends a modest amount of additional resources. The fraction of remaining 
budget is significant (more than 60% in almost all cases), which suggests either that successful 
missions had higher budgets than they could spend on available sensors or that unsuccessful 
missions had lower budgets than necessary and hence they were not able to reach the success 
threshold and so their budgets were not spent. When the number of missions is large, this can be 
attributed to the fact that there simply were not enough sensors due to high competition between 
missions. 

Another set of experiments, in which the number of missions was fixed at 50 and average budget 
given to missions was varied from 1 to 10, are also performed. Figure 6.18 shows the results for 
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a network with 250 nodes. We observe that the achieved profit initially increases rapidly with the 
budget size but slows as animating influence shifts from budget limitations to competition between 
missions. We observe the same pattern: MRGAP achieves highest profits followed closely by the 
greedy algorithm. Figures 6.19 and 6.20 show the same results when 500 nodes are deployed in 
the same area. 

From these results we can see that in dense networks both algorithms perform well, with the 
GAP-based algorithm slightly outperforming the greedy. 
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Figure 6.19 Fraction of spent budget (500 nodes). 
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Figure 6.20 Varying the average budget (500 nodes). 
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6.4 Sensor Assignment in Lifetime-Constrained Environments 


In this section, we focus on lifetime-constrained environments. In this dynamic setting, missions 
may start at different times and have different durations. In these, cases, explicit budgets, like the 
ones discussed in Section 6.3, may be too restrictive because the network must react to new missions 
given the current operating environment, that is, the condition of the sensors will change over 
time. Instead, battery lifetime is used as a means of discouraging excessive assignment of sensors 
to any one mission, evaluating trade-offs between the relative value of a given assignment and the 
expected profit earned with each sensor (given the mission statistics). 

There are two cases in this dynamic setting. In the first case, no advanced knowledge of the 
target network lifetime is assumed, that is, we do not know for how long the network will be 
required to operate. This is called the general dynamic setting. In the second case, it is assumed 
that the system operator has knowledge of the target network lifetime, that is, the network is 
needed for a finite duration. This is called the dynamic setting with a time horizon. In the following 
algorithms, the aggressiveness with which sensors accept new missions are adjusted based on trade- 
offs between target network lifetime, remaining sensor energy, and mission profit, rather than using 
hard budgets. 

In the following we discuss distributed algorithms that adjust sensors’ eagerness to participate in 
new missions based on their current operational status and the target network lifetime (if known). 
This section is based on [6]. 


6.4.1 Problem Definition 


In this dynamic setting, we no longer consider explicit budgets for missions and explicit costs 
for sensor assignments which were used in the Section 6.3. We keep, however, the same network 
model and the basic structure of the problem. In this case, what constraints the assignment problem 
is the limited energy that sensors have. The other essential change is the introduction of a time 
dimension. In this setting, each sensor has a battery size B, which means that it may only be used 
for at most B timeslots over the entire time horizon. Missions may arrive at any point in time and 
may last for any duration. 

If a sensor network is deployed with no predetermined target lifetime, then the goal may be to 
maximize the profit achieved by each sensor during its own lifetime. However, if there is a finite 
target lifetime for the network, then the goal is to earn the maximum total profits over the entire 
time horizon. The profit for a mission that lasts for multiple timeslots is the sum of the profits 
earned over all timeslots during the mission’s lifetime. 

The danger in any particular sensor assignment is then that the sensor in question might 
somehow be better used at a later time. Therefore, the challenge is to find a solution that competes 
with an algorithm that knows the characteristics of all future missions before they arrive. The 
general dynamic problem is specified by the following MP P : 


Maximize: >, ee Pi He) 

Such that: Yr", xijreij > dij for each Mj and z, 
pas xij < 1, for each S; and time z, 
Dir 2 je Xie < B, for each Si, 
xi E 10, 1) Yx and 


Jit € [0, 1] Wyit 
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If preemption is allowed, that is, a new mission is allowed to preempt an ongoing mission and 
grab some of its sensors, then in each timeslot sensors that are assigned to other missions can be 
freely reassigned based on the arrival of new missions, without reassignment costs. In this case, a 
long mission can be thought of as a series of unit-time missions, and so the sensors and missions 
at each timeslot form an instance of the NP-hard static problem. If preemption is forbidden, then 
the situation for the online algorithm is in a way simplified. If we assume without loss of generality 
that no two missions will arrive at exactly the same time, then the online algorithm can focus on 
one mission at a time. Nonetheless, the dynamic problem remains as hard as the static problem, 
since a reduction can be given in which the static missions are identified with dynamic missions of 
unit length, each starting € after the previous one. In fact, we can give a stronger result covering 
settings both with and without preemption. 


6.4.2 Algorithms 


In this section, we discuss heuristic-based algorithms that intelligently and dynamically assign 
sensors to missions. These heuristics are similar in operation to the dynamic proposal algorithm 
discussed in Section 6.2, but with a new focus. Rather than maximizing profit by trying to satisfy 
all available missions, the focus here is on maximizing the profit over network lifetime by allowing 
the sensors to refuse participation in missions they deem not worthwhile. 

Missions are dealt with as they arrive. A mission leader, a node that is close to the mission’s 
location, is selected for each mission. The mission leaders are informed about their missions’ 
demands and profits by a base station. They then run a local protocol to match nearby sensors to 
their respective missions. Since the utility a sensor can provide to a mission is limited by sensing 
range, only nearby nodes are considered. The leader advertises its mission information (demand 
and profit) to the nearby nodes (e.g., two-hop neighbors). 

When a nearby sensor hears such an advertisement message, it makes a decision either to propose 
to the mission and become eligible for selection by the leader or to ignore the advertisement. The 
decision is based on the current state of the sensor (and the network if known) and on potential 
contribution to mission profit that the sensor would be providing. Knowledge of the (independent) 
distributions of the various mission properties (namely, demand, profit, and lifetime) is needed 
to make proper assignment decisions. Such information can be learned from historical data. To 
determine whether a mission is worthwhile, a sensor considers a number of factors: 


Mission’s profit, relative to the maximum profit 

Sensor's utility to the mission, relative to the mission’s demand 
Sensor's remaining battery level 

Remaining target network lifetime, if known 


After gathering proposals from nearby sensors, the leader selects sensors based on their utility 
offers until it is fully satisfied or there are no more sensor offers. The mission (partially) succeeds if 
it reaches the success threshold; if not, it releases all sensors. 

Since it is assumed that all distributions are known, the share of mission profit potentially 
contributed by the sensor (i.e., if its proposal is accepted) can be compared to the expectation of 
this value. Based on previous samples, the expected mission profit E[p] and demand E[d] can 
be estimated. Also, knowing the relationship between sensor—mission distance and edge utility, 
and assuming a uniform distribution on the locations of sensors and missions, the expected utility 
contribution E[u] that a sensor can make to a typical mission in its sensing range can be computed. 
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The following expression is used to characterize the expected partial profit a sensor provides to a 
typical mission: 


u Elp] 


We consider two scenarios. In the first, the target network lifetime is unknown, that is, we 
do not know for how long will the network be needed. In this case, sensors choose missions 
that provide higher profit than the expected value and hence try to last longer in anticipation 
of future high profit missions. In the second, the target network lifetime is known, that is, 
we know the duration for which the network will be required. In this case, sensors take the 
remaining target network lifetime into account along with their expected lifetime when deciding 
whether to propose to a mission or not. In the following we describe solutions to these two 
settings. 


6.4.2.1 Energy-Aware Algorithm 


In this algorithm, the target lifetime of the sensor network is unknown. For a particular sensor and 
mission, the situation is characterized by the actual values of mission profit (p) and demand (4) and 
by the utility offer (u), as well as the fraction of the sensor’s remaining energy (f). For the current 
mission, a sensor computes this value: 

Taty (6.3) 

d P 

Each time a sensor becomes aware of a mission, it evaluates expression (6.3). It makes an 

offer to the mission only if the value computed is greater than expression (6.2). By weighting 
the actual profit of a sensor in (6.3) by the fraction of its remaining battery value, the sensors 
start out eager to propose to missions, but become increasingly selective and cautious over time, 
as their battery levels decrease. The lower a sensor’s battery gets, the higher relative profit it will 
require before proposing to a mission. Since different sensors’ batteries will fall at different rates, 
in a dense network it is expected that most feasible missions will still receive enough proposals to 
succeed. 


6.4.2.2 Energy and Lifetime-Aware Algorithm 


If the target lifetime of the network is known, then sensors can take it into account when making 
their proposal decisions. To do this, a sensor needs to compute the expected occupancy time ta, the 
amount of time a sensor expects to be assigned to a mission during the remaining target network 
lifetime. To find this value, it is required to determine how many missions a given sensor is 
expected to see. Using the distribution of mission locations, we can compute the probability that 
a random mission lies within a given sensor’s range. Combining this with the remaining target 
network lifetime and arrival rate of missions, we can find the expected number of missions to which 
a given sensor will have the opportunity to propose. Thus if the arrival rate and the (independent) 
distributions of the various mission properties are known, we can compute za as follows: 


ta =TXAXg xY x E[N 
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where 

T is the remaining target network lifetime, that is, the initial target network lifetime minus 
current elapsed time 

A is the mission arrival rate 

g = TR?/A is the probability that a given mission location (chosen uniformly at random) lies 
within sensing range, R, is the sensing range, and A is the area of the deployment field 

EU] is the expected mission lifetime 

y is the probability that a sensor’s offer is accepted* 


For each possible mission, the sensor now evaluates an expression which is modified from (6.2). 
The sensor considers the ratio between its remaining lifetime and its expected occupancy time. If ty 
is the amount of time a sensor can be actively sensing, given its current energy level, the expression 
then becomes 


ae (6.4) 
d P ta 

If the value of expression (6.4) is greater than that of expression (6.2), then the sensor proposes 
to the mission. Moreover, if the sensor's remaining target lifetime is greater than its expected 
occupancy time, the sensor proposes to any mission since in this case it expects to survive until 
the end of the target time. The effect on the sensor's decision of weighting the mission profit by 
the ratio (¢;/t) is similar to the effect of weighting the fraction of remaining energy (f) had in 
expression (6.3); all things being equal, less remaining energy makes a sensor more reluctant to 
propose to a mission. As the network approaches the end of its target lifetime, however, this ratio 
will actually increase, making a sensor more willing to choose missions with profits less than what 
it “deserves” in expectation. After all, there is no profit at all for energy conserved past the target 
lifetime. 


6.4.3 Performance Evaluation 


The algorithms were tested using a simulator similar to the one discussed in Section 6.3. In the 
dynamic problem, however, missions arrive without warning over time, and the sensors used to 
satisfy them have limited battery lives. The goal is to maximize the total profit achieved by missions 
over the entire duration of the network. In this section, the dynamic heuristic algorithms are 
tested on randomly generated sensor network histories in order to gauge the algorithms” real-world 
performance. 

The energy-aware (E-aware) and the energy and lifetime-aware (E/L-aware) algorithms are 
compared with a basic algorithm (Basic) that does not take energy or network lifetime into account 
when making the decision on to which mission to propose (i.e., sensors propose to any mission in 
their range). For comparison purposes, the performance of a network infinite-energy batteries (i.e., 
B > simulation time) is also shown. 

Since finding the true optimal performance value is NP-hard, an exact comparison is not 
possible. A standard strategy in this kind of situation is to compare with an LP relaxation of the 
offline IP, that is, of program P of Section 6.4.1. This is a relaxation in the sense of allowing 
fractional values for the decision variables, as well as allowing full preemption. Its solution value 
necessarily upper bound the true offline optimal value, just as the LP relaxation of program P 


* Computing this value would imply a circular dependency; in the simulations it was chosen a priori to be 0.25. 
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did for the experiments in the static setting presented in Section 6.3. Although solving such an 
LP is theoretically tractable, the size of the LP program is significant, with a decision variable 
for each (S;, Mj, t) triple. Even with a problem instance as small as 100 sensors and 50 missions, 
the LP solver used [14] performs the computation with a gigabyte of memory. Clearly solving 
more significant problem sizes would be impractical by this method. Instead, a further relaxation, 
program P , is used which is also upper bounding the optimal: 


. . m 
Maximize: YU") pili 
Such that: X ;—1 xijeij > djliyj, for each mission Mj, 
E 1 Xi < b, for each sensor S;, and 


xy € [0, 1] Yx; and y; € [0, 1] Yy; 


This formulation condenses the entire history into a single timeslot. The profits and demands 
are multiplied by the duration of the mission /. Since time is elided, the sensor constraint now 
asserts only that at sensor be used (fractionally) at b times, over the entire history. The optimal 
solution of this LP will upper bound the optimal solution achievable in the original problem. Note 
that the solution value provided is the total profits over the entire history, not a time-series. This 
value is indicated in the plots by a straight line drawn at the average profit corresponding to this 
total. In the experiments, P was solved with a software package [14]. 


6.4.3.1 Simulation Setup 


The simulators use assumptions similar to the ones made in Section 6.3.3. In addition to those 
assumptions, here it is also assumed that missions arrivals are generated by a Poisson process, with 
an average arrival rate of 4 missions/h or 8 missions/h depending, on the experiment. Each sensor 
starts with a battery that will last for 2 h of continuous sensing (i.e., B= 7200 s). It is assumed 
that the battery is used solely for sensing purposes, not for communication. Mission lifetimes are 
exponentially distributed, with an average of 1 h, a minimum of 5 min and a maximum of 4 h. The 
sensor network comprises 500 nodes. In order to reward cumulative and broad-based success, the 
simulator continues to award profit to a mission only so long as total achieved profit per-timestep 
is at least 50% of the optimal. 


6.4.3.2 Results 


In both the E-aware and the E/L-aware algorithms, and in order to determine whether the sensor 
should propose to the mission, the expected profit of a mission (expression (6.2)) needs to be 
computed. Because the demand and profit value distributions are capped, the actual averages are 
not equal to the a priori averages of the distributions. It was found, empirically, that the average 
demand of d = 1.2 and an average profit of p = 10.9. The empirical average duration, which is 
used to evaluate expression (6.3) was found to be 3827.8 s (roughly 1 h). 

Figure 6.21 shows the achieved profit (as a fraction of maximum available) per timestep. The 
results are the average of 200 runs. The target network lifetime is 3 days (shown as a fine vertical 
line), but the simulations continue for one full week. Knowledge of the target network lifetime is 
used by the E/L-aware and in the LP. The other algorithms assume potentially infinite duration. 
The 50% minimum profit is shown with a fine horizontal line. 
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Figure 6.21 Fraction of achieved profits (arrival rate = 4 missions/h). 


From Figure 6.21 we see that the profits of all algorithms stay above the 50% threshold for the 
target lifetime. Basic achieves most of its profits in the beginning and then profits go down (almost 
linearly) as time progresses. The E-aware algorithm tries to conserve its resources for high profit 
missions. Because it ignores the fact that we care more about the first 3 days than anytime after 
that, it becomes overly conservative and ignores many missions. Such an algorithm is better suited 
to the case when there is no known target lifetime for the network and we want the network to last 
as long as possible. We see that the profit for E-aware does not fall below the 50% threshold until 
the end of the 6th day. 

In the E/L-aware algorithm, nodes will initially be aggressive in accepting missions that might 
not provide their expected value, but become more cautious as their energy is used. However, 
unlike E-aware, as their remaining expected occupancy time approaches their remaining lifetime, 
sensors will again accept missions with lower and lower profits. The curves for E-aware and 
E/L-aware cross by the middle of the 4th day, after which point E-aware dominates. When 
compared to the average LP solution, we see that E/L-aware does very well, within a few percentage 
points of the optimal (on average). In terms of total target lifetime profit (i.e., the area under 
the curve for the first 3 days), the E/L-aware was found to achieve about 84% of the profits 
compared to 72% for the E-aware. This means that E/L-aware achieves close to 17% higher 
profits. If the sensor’s battery lifetime is increased from 2 to 3 h, the percentage increase becomes 
about 22%. 

The fraction of extant sensors (i.e., those sensors still alive, whose batteries are not yet exhausted) 
over time is shown in Figure 6.22. Because sensors propose to any mission within its range in 
Basic, no matter how low the profit is, nodes start dying rapidly. By the end of the third day, only 
half the nodes can be used for sensing, and by the end of the seventh day this falls below 15%. In 
E-aware, nodes become very cautious as their batteries run low, which helps the network to last 
longer without significant sacrifice of achieved profits per timeslot. By the end of the 7 days, about 
72% of the nodes remain living. For E/L-aware, sensors accept more missions, and hence are used 
at a higher rate, as the target lifetime of the network approaches. In the figure, we can see this 
happening by the 2nd day, when the curve of E/L-aware diverges from that of E-aware. By the end 
of the 7th day, E/L-aware has used nearly as much energy as Basic. 
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Figure 6.22 Fraction of extant nodes (arrival rate = 4 missions/h). 
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Figure 6.23 Fraction of achieved profits (arrival rate = 8 missions/h). 


One thing to note is that E/L-aware acts like Basic once the target lifetime of the network 
has passed, that is, under it sensors propose to all available missions. If this behavior changed to 
emulate E-aware’s, we could expect the energy usage, and the exhaustion of sensor batteries, to 
slow down. With more nodes remaining alive for longer times, the decrease in profit following the 
target network lifetime point would be less dramatic. 

Figures 6.23 and 6.24, respectively, show the fraction of achieved profit and fraction of extant 
nodes over time, with twice the previous arrival rate. Due to the increased number of missions, 
sensors are used more rapidly and hence both the profit and fraction of extant nodes decrease 
quickly. Basic passes the 50% profit line by the middle of the 2nd day and both E-aware and 
E/L-aware pass that point in the beginning of the 4th day. But by that point, E/L-aware achieves 
significantly higher profits than E-aware. Similar effects are seen on the fraction of extant nodes. 

Finally, Figure 6.25 shows the effect of the initial battery lifetime on the performance of both 
the E-aware and E/L-aware in terms of achieved profits. Here, an arrival rate of 4 missions/h is 
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Figure 6.24 Fraction of alive nodes (arrival rate = 8 missions/h). 
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Figure 6.25 Effect of initial battery time on profit (arrival rate = 4 missions/h). 


assumed and the fraction of the total profits achieved in the first 3 days is counted. Each of these 
fractions can be thought of as the ratio of the left rectangle (the first 3 days) in Figure 6.21 and the 
area under the curve of the corresponding algorithm. The initial energy lifetime of sensors is varied 
from 1 to 7 h. The effect of increasing the battery lifetime is most pronounced in the beginning. 
After all, the closer the amount of stored energy gets to the expected occupancy time, the greater 
the likelihood that energy will be left unspent. We also note that E/L-aware uses the increased 
battery lifetime more effectively since it takes both the battery and occupancy time into account. 
The energy-aware would only consider the fraction of used energy, which declines in influence as 
battery life increases. 

The performance evaluation results show that knowledge of energy level and network lifetime 
is an important advantage: the algorithm given both of these values significantly outperforms the 
algorithm using only the energy level and the algorithm that uses neither. Given knowledge of both 
energy and target lifetime, the algorithm can achieve profits 17%—22% higher than if only energy 
is known. 
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6.5 Conclusion and Research Directions 


In this chapter, we discussed different aspects of the sensor-mission assignment problem in which the 
objective is to maximize the overall utility of the network. We defined the problem and considered 
it under different constraints; for each variation we discussed different algorithms to solve the 
problem. The limitations of sensor devices, such as energy and budget, were key considerations in 
the design process of the solutions we considered. Another consideration was the ad hoc nature 
of wireless sensor networks that makes centralized solutions hard to deploy; hence we considered 
distributed algorithms as well. 

There are several issues that remain open for research. In this chapter, we only considered 
directional sensors for which each sensor can only be assigned to a single mission. Directional 
sensors, however, can also serve multiple missions through time-sharing. For example, a mission 
may require an image of a target every 30 s. In this case, multiple missions can use the same camera 
by rotating the camera to different directions as long as it can meet the requirements of the different 
missions. 

Some sensors are omnidirectional, that is, they can sense from multiple directions. For example, 
the information provided by a single sensor that measures the ambient temperature can be used 
to support multiple missions given that they all lie within its sensing range. Although using only 
directional sensors can be more challenging because the utility from one sensor can only benefit a 
single mission, having omnidirectional sensors in the network can create a problem instance that 
is different from what we have considered in this dissertation. Such a problem is typically easier 
to solve in terms of finding a feasible solution that satisfies all missions. What may be challenging, 
however, is to optimize the sensor assignment so that we can find the smallest set of sensors that 
can satisfy the requirements of all missions, which is desirable to conserve resources. 

Another venue for research is studying the effects of mobility on the solutions and how we can 
design better solutions to utilize it. We can see two types of mobility: controlled and uncontrolled. 
Controlled mobility is the ability to move all or some of the sensing resources in order to achieve 
better assignments to missions. In uncontrolled mobility the sensing resources might be mounted 
on people (e.g., helmets of solders in a battlefield) in which case the system has no control on where 
the sensors are to be moved. 
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Most sensor network applications aim at monitoring the spatiotemporal evolution of physical 
quantities, such as temperature, light, or chemicals, in an environment. In these applications, 
low-resources sensor nodes are deployed and programmed to collect measurements at a predefined 
sampling frequency. Measurements are then routed out of the network to a network node with 
higher resources, commonly referred to as base station or sink, where the spatiotemporal evolution 
of the quantities of interest can be monitored. 

In many cases, the collected measurements exhibit high spatiotemporal correlations and follow 
predictable trends or patterns. An efficient way to optimize the data collection process in these 
settings is to rely on machine learning techniques, which can be used to model and predict the 
spatiotemporal evolution of the monitored phenomenon. 

This chapter presents a survey of the learning approaches that have been recently investigated 
for reducing the amount of communication in sensor networks by means of learning techniques. 
We have classified the approaches based on learning into three groups, namely, model-driven data 
acquisition, replicated models (RM), and aggregative approaches. 

In model-driven approaches, the network is partitioned into two subsets, one of which is used to 
predict the measurements of the other. The subset selection process is carried out at the base station, 
together with the computation of the models. Thanks to the centralization of the procedure, these 
approaches provide opportunities to produce both spatial and temporal models. Model-driven 
techniques can provide high energy savings as part of the network can remain in an idle mode. 
Their efficiency in terms of accuracy is, however, tightly dependent on the adequacy of the model 
to the sensor data. We present these approaches in Section 7.2. 

RM encompass a set of approaches where identical prediction models are run in the network 
and at the base station. The models are used at the base station to get the measurements of sensor 
nodes, and in the network to check that the predictions of the models are correct within some user 
defined €. A key advantage of these techniques is to guarantee that the approximations provided 
by the models are within a strict error threshold € of the true measurements. We review these 
techniques in Section 7.3. 

Aggregation approaches allow to reduce the amount of communication by combining data 
within the network, and provide to a certain extent a mixture of the characteristics of model- 
driven and RM approaches. They rely on the ability of the network routing structure to aggregate 
information of interest on the fly, as the data are routed to the base station. As a result, the 
base station receives aggregated data that summarize in a compact way information about sensor 
measurements. The way data are aggregated depends on the model designed at the base station, and 
these approaches are therefore in this sense model driven. The resulting aggregates may, however, 
be communicated to all sensors in the network, allowing them to check the approximations 
against their actual measurements, as in RM approaches. We discuss aggregative approaches in 
Section 7.4. 


7.1 Problem Statement 


The main application considered in this chapter is the periodic data collection, in which sensors take 
measurements at regular time intervals and forward them to the base station [35]. Its main purpose 
is to provide the end-user, or observer, with periodic measurements of the whole sensor field. 
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This task is particularly useful in environmental monitoring applications where the aim is to follow 
both over space and time the evolution of physical phenomena [29,35]. 


7.1.1 Definitions and Notations 


The set of sensor nodes is denoted by S={1,2,...,5}, where S is the number of nodes. The 
location of sensor node i in space is represented by a vector of coordinates c; € R7, where d is the 
dimension of the space, typically 2 or 3. Besides the sensors and the battery, a typical sensor node 
is assumed to have a CPU of a few mega hertz, a memory of a few tens of kilobytes, and a radio 
with a throughput of a few hundreds of kilobits per second [35]. 

The entity of interest that is being sensed is called the phenomenon. The vector of the measure- 
ments taken in the sensor field at time ¢ is denoted by s[¢] = (5;[¢], sa[¢],...,ss[¢]) € RS. The 
variable s[¢] may be univariate or multivariate. Whenever confusion is possible, the notation s;[ż] 
is used to specify the sensor node concerned, with i € S. The time domain is the set of natural 
numbers 7 = N, and the unit of time is called an epoch, whose length is the time interval between 
two measurements. 

Most of the prediction models considered in this chapter aim at approximating or predicting 
sensor measurements. Denoting by 5;[¢] the approximation of s;[£] at time z, the prediction models 
will typically be of the form 


he: X — R 


x => ir] = bo (x) 


The input x will in most cases be composed of the sensor measurements, sensor coordinates, 
and/or time. 


Example 7.1: 


In many cases, there exist spatiotemporal dependencies between sensor measurements. A model 
may, therefore, be used to mathematically describe the relationship existing between one sensor 
and a set of other sensors. For example, a model: 


ho : R? >R 
x = (sjit, skt) > SI = 915,1] + 925,10] 
can be used to approximate the measurement of sensor ¡ on the basis of a linear combination of 
the measurements of sensors j and k, with i, j,k e S, and the parameters O = (91, 02) e R2. The 


use of such a model may allow to avoid collecting the measurement from sensor i if the model is 
assumed to be sufficiently accurate. 


Example 7.2: 


A model may aim at approximating the scalar field of the measurements. The input domain is the 
space and time domains Rd x R, and the output domain is the set of real numbers, representing 
an approximation of the physical quantity at one point of space and time. For example, a linear 
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Figure 7.1 Network load sustained by each sensor node in a network of nine sensors, for two 
different routing trees. The bar plot (left) gives the total number of packets processed (receptions 
and transmissions) by each node in the network (right). (a) First configuration and (b) Second 
configuration. 


combination of time and spatial coordinates c; € R? in a two-dimensional space domain can be 
used to create the following model: 


ho: R? x R) >R 


x= (ci, © Sit] = xo" 


where 8 = (01,02, 03) € R? are the weights of the linear combination which form the parameter 
vector 6. If the weights are known or can be estimated on the basis of past data, the model can be 
used to get approximated measurements of the sensors without actually collecting data from the 
network. The model can, moreover, be used to predict the measurements at future time instants 
or at locations where sensors are not present. 


Let us assume that the data are retrieved from the network by means of a routing tree as shown 
in Figure 7.1. The observer specifies the sampling frequency at which the measurements must be 
retrieved, using, for example, an aggregation service such as TAG or Dozer [3,24]. 


7.1.2 Network Load and Sensor Lifetime 


The purpose of prediction models is to trade data accuracy for communication costs. Prediction 
models are estimated by learning techniques, which use past observations to represent the rela- 
tionships between measurements by means of parametric functions. The use of prediction models 
allows to provide the observer with approximations $[£] of the true set of measurements s[¢] and to 
reduce the amount of communication by either subsampling or aggregating data. 

Given that the radioactivity is the main source of energy consumption, the reduction of the 
use of radio is the main way to extend the lifetime of a sensor network. In qualitative terms, the 
lifetime is the time span from the deployment to the instant when the network can no longer 
perform the task [34]. The lifetime is application specific. It can be, for example, the instant when 
a certain fraction of sensors die, loss of coverage occurs (i.e., a certain portion of the desired area 
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can no longer be monitored by any sensor), or loss of connectivity occurs (i.e., sensors can no 
longer communicate with the base station). For periodic data collection, the lifetime can be more 
specifically defined as the number of data collection rounds until x percent of sensors die, where A is 
specified by the system designer [31]. This definition makes the lifetime independent of the sampling 
rate at which data are collected from the network. Depending on «, the lifetime is, therefore, 
somewhere in between the number of rounds until the first sensor runs out of energy and the 
number of rounds until the last sensor runs out of energy. 

The communication costs related to the radio of a node ż is quantified by the network load L,, 
which is the sum of the number of received and transmitted packets during an epoch. Denoting 
by Rx; and Tx; the number of packet receptions and packet transmissions for node 7 during the 
epoch of interest, we have 


L; = Rx; + Tx; 


A network packet is assumed to contain one piece of information. A sensor transmitting its 
measurements and forwarding the measurement of another sensor therefore processes three packets 
during an epoch, that is, one reception and two transmissions. 

Figure 7.1 illustrates how the network load is distributed among sensors during an epoch. The 
network loads are reported for each sensor on the left of Figure 7.1, for two different routing trees 
built such that the number of hops between the sensor nodes and the base station is minimized. 
Leaf nodes sustain the lowest load (only one transmission per epoch), whereas the highest load is 
sustained in both cases by the root node (8 receptions and 9 transmissions, totalizing a network 
load of 17 packets per epoch). 

In data collection, it is important to remark that the nodes that are likely to have the lowest 
lifetime are the nodes close to the base station, as their radioactivity is increased by the forwarding 
of data. The lifetime of these nodes is therefore closely related to the network lifetime; since once 
these nodes have run out of energy, the rest of the network gets out of communication range of 
the base station. A particular attention therefore is, given in this chapter to the number of packets 
received and transmitted by the root node. More generally, we will often aim at quantifying the 
upper bound 


Lmax = max L; (7.1) 


of the distribution of the network loads in the network. Most of the methods and techniques 
discussed in this chapter aim at reducing this quantity, which will be referred to as highest 
network load. In order to consider the effects of collisions, interference or radio malfunctioning, 
we will use orders of magnitudes instead of precise counting of the packets. Without the use of 
learning strategies, the order of magnitude of the highest network load is in 


Lmax Y OCS) 


where S is the number of nodes. 


7.1.3 Data Accuracy 


The quantification of the error implied by a prediction model is an important practical issue, as 
in many cases the observer needs to know how far the predictions obtained at the base station are 
from the true measurements. Three levels of accuracy are encountered in this chapter. 
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Eirst, probabilistic bounded approximation errors will refer to approximations where 


Pst] — Siel > e) = 1—5, Vie Ste T (7.2) 


which guarantees that, with probability 1 — 5, approximations do not differ by more than € from 
the true measurements. The observer can set the error threshold e and the probability guarantee 5. 
Second, bounded approximation errors will refer to approximations where 


Ist] —5s,[¢]| < €, Vie SteT (7.3) 


which ensures the observer that all approximations $; obtained at the base station are within te of 
the true measurement s;[£]. This level of accuracy is the highest, as it allows the observer to precisely 
define the tolerated error threshold. 

Finally, unbounded errors will refer to modeling schemes where there is no bound between the 
approximations $;[+] obtained at the base station and the true measurement s;[7] taken by sensor i. 


7.2 Model-Driven Acquisition 


In model-driven data acquisition [6,7,15,21,22,37], a model of the sensor measurements is esti- 
mated at the base station and used to optimize the acquisition of sensor readings. The rationale of 
the approach is to acquire data from sensors only if the model is not sufficiently rich to provide the 
observer with the requested information. An overview of the approach is presented in Figure 7.2. 
In the first stage, measurements are collected over N epochs from the whole network at the base 
station, and stored in a matrix X of dimension N x S, where column j contains the measurements 
from sensor j over the N epochs, and row i contains the measurements from the whole network 
during the ith epoch. The data set X is then used to estimate a model able to answer the users’ 
queries without collecting data from the whole network. More precisely, the model-driven system 
aims at finding a subset of sensors Sy C S from which the measurements of the other sensors 
Sp = S\S; can be predicted. The subscripts p and q refer to the queried and predicted subsets. 
Once the subsets Sy and Sp have been identified, measurements are only collected from the 
sensors in Sy. 


Learning 
process: 
What sensors 
SES 
can be used to 


predict sensor 
SES? 


N epochs 


Base 
station 


Figure 7.2 Model-driven approach: The learning process takes place at the base station. It aims 
at finding a subset of sensors from which the measurements of the other sensors can be predicted. 
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In practice: Model-driven approaches are particularly appropriate for scenarios where groups of 
sensor nodes have correlated measurements. One node of the group sends its measurements to the 
base station, which uses them to predict the measurements of other nodes in the group. Thanks 
to the fact that the base station has a global view of the measurements collected, model-driven 
approaches allow to detect correlation between sensor nodes that may be far away in the network. A 
priori information on the periodicity or stationarity of the measurements can be used to determine 
how many observations V should be collected before using the model. For example, a network 
collecting outdoor temperature measurements is likely to exhibit diurnal patterns. If the patterns 
are consistent over days, observations can be taken over a one day period and used to model the 
measurements of the following days. 


7.2.1 Optimization Problem 


The model used at the base station aims at predicting a vector of measurements $p[£] for sensors in 
S, with a prediction model 


ho : R'a! > RIS! 
Sqlt] > plz] (7.4) 


where the input Sql t] is the vector of measurements collected from sensors in S, at time £ and 0 is 
a vector of parameters. The model-driven approach allows to trade energy for accuracy by carefully 
choosing the subsets |S,| and |S]. 

Cost: The cost associated to the query of a subset of sensors Sy is denoted by C(S¿), and aims 
at quantifying the energy required to collect the measurements from C(S¿). The cost is divided 
into acquisition and transmission costs in [6,7]. Acquisition refers to the energy required to collect 
a measurement, and transmission to the energy required for sending the measurement to the base 
station. The transmission costs are in practice difficult to estimate because of multi-hop routing 
and packet loss issues. A simple and qualitative metric is to define the cost as the number of 
sensors in Sy. 

Accuracy: Let sp,[t] and Sp; [ż] be the true measurement and the by prediction for the ith sensor 
in Sp at time £. The accuracy associated to Sp, is denoted by R(S),), and the accuracy associated 
to the vector of prediction 5, is denoted by R(S,). Different choices are possible to define how 
accuracy is quantified. In [6,7], authors suggest using 


R(S,) = Pls [t] e ll — es $p, [t] + ed) (7.5) 


where € S and e isa user-defined error threshold. This accuracy metric quantifies the probability 
that the true measurement s,,[7] is within +e of the prediction s,,[t]. The overall accuracy of the 
vector of prediction 5,[7] is defined as the minimum of $,,[ż], that is, 


R(Sp) = min R(Sp;) (7.6) 


Optimization loop: The goal of the optimization problem is to find the subset S, that minimizes 
C(S,), such that R(S,) > 1 — 5, where 5 is a user-defined confidence level. An exhaustive search 
among the set of partitions (Sy, Sp} can be computationally expensive. There exists 2° combinations 
for the set of predicted sensors, and S can be large. In order to speed up this process, an incremental 
search procedure similar to the forward selection algorithm can be used. The search is initialized 
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with an empty set of sensors S, = Ø. At each iteration, for each i € Sp, the costs C(S, U 4) 
and accuracy R(S,\i) are computed. If one sensor i can be found such that R(S,1¿) > 1 — 5, the 
procedure returns Sy U î as the set of sensors to query and S,\i as the set of sensors to predict. 
Otherwise, the sensor node that provided the best trade—off is added to the subset S¿ and removed 
from Sp. The best sensor node is the node that maximizes the ratio R(S,\i)/C(S, U Ò [6,7]. 


7.2.2 Multivariate Gaussians 


Different learning procedures can be used to compute the model Pg in (7.4). Authors in [6,7] 
suggest the use of a multivariate Gaussian to represent the set of measurements, which allows 
to compute predictions and confidence bounds using computationally efficient matrix products. 
Denoting by s = (s1, 82,...,85) € RS the random vector of the measurements, where the ith 
value represents the measurement of the ith sensor, the Gaussian probability density function (pdf) 
of s is expressed as 


1 (¿6-7 216-1) 
p(s = s) = —==exp! * 
Jems 

where uand 2 are the mean and covariance matrix of the random vector s (Figure 7.2). Figure 7.3 
illustrates a Gaussian over two correlated variables sı and s2. For a Gaussian, the mean is the 
point at the center of the distribution, and the covariance matrix © characterizes the spread of 
the distribution. More precisely, the ith element along the diagonal of 2, 0(;,;) is the variance of 
the ith variable, and off-diagonal elements og j characterize the covariances between the pairs (i, j) 
of variables. A high covariance between two variables means that their measurements are correlated, 
such as variables sı and s2 in Figure 7.3. 

When sensor measurements are correlated, information on some measurements constrains the 
values of other measurements to narrow probability bands. Let sy[*] € S4 and sp[ż] € S, be the 
vectors of measurements of sensors in Sy and S, at time £, with S, = S\S,. The Gaussian pdf 


1 
ppls) = === 
y (2m)! 
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Figure 7.3 Gaussian model of two correlated variables. Knowledge of the outcome of s; allows 
to better estimate the outcome of s2, thanks to conditioning. 
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of the random variable sp can be computed using the mean ui and covariance matrix 2 of the model 
p(s). This computation, called conditioning, gives [26] 


Hplg = Hp + Er Ea (sqlt] — Ug) 
Dog = Upp — UpqUqq Uap 


where Xp, denotes the matrix formed by selecting the rows Sp and the columns S, from the matrix 
X. After conditioning, the best approximations 5,[t] to sensors in Sp are given by the mean vector 


Spt] = Hg (7.7) 


The probability 


P (sil) e [54] — e Si] + e]) (7.8) 


depends on the variance of the measurements of the sensors in S, after the conditioning. These 
variances are actually known as they are the diagonal elements of the covariance matrix 2py, and 
allow to estimate the quantity (7.8) by referring to a student’s ¢ table [19]. 


7.2.3 Discussion 


With model-driven data acquisition, the communication savings are obtained by reducing the 
number of sensors involved in the data collection task. These approaches can provide important 
communication and energy savings, by leaving the set of sensors in S, in an idle mode. For 
continuous queries, however, the constant solicitation of the same subset of sensors leads to an 
unequal energy consumption among nodes. This issue can be addressed by recomputing from time 
to time the subset of sensors Sp in such a way that sensors whose remaining energy is high are 
favored [16,37]. 

In terms of communication savings, the network load is reduced by a factor that depends on 
the ratio of the total number of sensors over the number of queried sensors. The highest load is in 


Emax ~ O(S4)) 


and is sustained by the root node of the tree connecting the sensors in S4. 

The main issue of the model-driven approach probably lies in the assumption that the model 
is correct, which is a necessary condition for the scheme to be of interest in practice. The model 
parameters u and X must be estimated from data, and any changes in the data distribution 
will lead to unbounded errors. Also, the presence of outliers in the queried measurements may 
potentially strongly affect the quality of the predictions. Model-driven data acquisition therefore 
allows potentially high communication savings, but is little robust to unexpected measurement 
variations and nonstationary signals. 


7.3 Replicated Models 


RM form a large class of approaches allowing sensor nodes to remove temporal or spatial 
redundancies between sensor measurements in order to decrease the network load. They were 
introduced in the field of sensor networks by Olston et al. in 2001 [27]. Their rationale 
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Figure 7.4 Replicated models: Only the models are communicated to the base station. As long 
as the model is correct, no communication is necessary. 


consists in having identical prediction models running both within the network and at the 
base station. Other names have been given to this approach in the literature, such as approx- 
imate caching [27], dual Kalman filters [11], replicated dynamic probabilistic models [5], dual 
prediction scheme [18,32], and model-aided approach [36]. We will adopt here the RM denom- 
ination since it is the one that better expresses in our sense the common rationale of these 
approaches. 

Figure 7.4 gives an illustration of the RM approach. Four models, 4;, are used to predict the 

sensor measurements, one for each sensor 7. The base station and the sensors use the same models. 
At the base station, the model is used to predict the measurements. On the sensor nodes, the 
model is used to compute the same prediction as the base station, and to compare it with the true 
measurement. If the prediction is more than a user-defined € away from the true measurement, 
a new model is communicated. RM approaches, therefore, guarantee the observer with bounded 
approximation errors. RM may be used to predict measurement both over the temporal and spatial 
domains. We first cover the approaches where models only take into account temporal variations, 
and then present the strategies that have been investigated to extend RM to the modeling of spatial 
variations. 
In practice: In many environmental monitoring applications, the observer can tolerate approxima- 
tions for the collected measurements. For example, in plant growth studies, ecologists reported that 
it is sufficient to have accuracy of +0.5°C and 2% for temperature and humidity measurements, 
respectively [7]. RM provide an appropriate mechanism to deal with these scenarios. 


7.3.1 Temporal Modeling 


Temporal modeling with RM is the simplest approach, as sensors independently produce prediction 
models for their own measurements [11,27,32]. Models are of the kind 


Silt] = Ai(x, 0) 


where x is a vector of inputs that consists, for example, in the past measurements of sensor 7, and 
O is a vector of parameters. There is one model for each sensor node i, whose task is to produce 
predictions for the upcoming measurements of the sensor node. Once such a model is produced, 
it is communicated to the base station, which uses it to infer the actual measurements taken by 
sensor i. 
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Sensor nodes use their own copy of the model to compare the model predictions with the 
sensor measurements. The model is assumed to be correct as long as the predicted and the true 
measurements do not differ by more than a user, that is defined error threshold e, 


|s¿[s] — s,[#]| < e 


Once the prediction is more than € away from the true measurement, an update is sent to the base 
station to notify that the error threshold is not satisfied. The update may consist of the measurement, 
or of the parameters of a new model built on the basis of the more recent measurements. This way, 
RM guarantee the observer that all measurements provided at the base station are within +e of the 
true measurements. 

RM allow to reduce communication since, as long as the model predictions are within +e of 
the true measurement, no updates occur between the sensor nodes and the base station. The inputs 
of the model, if they depend on the sensor measurements, are inferred by the base station using 
the measurements predicted by the model. The sensor nodes also use past predicted measurements 
as inputs to their models. This way, the sensor nodes and the base station apply exactly the same 
procedure. Sensor measurements are sent only when a model update is needed. 

The pseudocode for running RM on a sensor node is given by Algorithm 7.1. The subscript ż is 
dropped for the sake of clarity. On the base station, the scheme simply consists in using the most 
recently received model to infer the sensor’s measurements. Four types of techniques have been 
proposed to run temporal RM [12,14,27,32,37] that are presented in detail in the following. The 
following summary gives an overview of their differences: 


Algorithm 7.1 RM—Replicated model algorithm 


Input: 
e: Error threshold. 
h: Model. 


Output: 
Packets containing model updates sent to the base station (BS). 


zl 
2: O[z] < init(h) 

3: Ot «— Oz] 

4: sendNewModel(/, 8%) to BS 

5: while True do 

6 t=ot+l 

7 s[t] — getNewReading() 

8 s[t] — getPrediction(», glas) 

9 O[£] — update(h, O[t — 1], s[t]) 


10: if |5[¢] — s[ż]| > € then 

11: glas — grs] 

12: sendNewModel(/, 05) to BS 
13: end if 


14: end while 
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m Constant model [14,27]: The most simple model, nonetheless often efficient. It does not 
rely on any learning procedure, and as a result cannot represent complex variations. 
Kalman filter [12]: The technique provides a way to filter the noise in the measurements. 

m Autoregressive model [37]: It allows to predict more complex variations than the constant 
model, but requires a learning stage. 

m Least mean square (LMS) filter [32]: The filter is based on an autoregressive model that 
adapts its coefficients over time. 


7.3.1.1 Constant Model 


In Refs. [14,27], RM are implemented with constant prediction models 
silt] = si[t — 1] 


Although simple, the constant model is well suited for slowly varying time series. Also, it has the 
advantage of not depending on any parameters. This keeps the size of an update to a minimum, 
which consists only in the new measurement s;[£]. This makes the constant model bound to 
reduce the communication between the sensor and the base station. Figure 7.5 illustrates how a 
constant model represents a temperature-time series. The time series was obtained from the ULB 
Greenhouse dataset [15] on August 16, 2006. Data were taken every 5 min, for a one day period, 
giving a set of 288 measurements. The measurements are reported with the red dashed lines, 
and the approximations obtained by a constant model with an error threshold of € = 1°C are 
reported with the black solid line. Updates are marked with black dots at the bottom of the figure. 
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Figure 7.5 A constant model acting on a temperature-time series from the Ulb Greenhouse 
[15] (August 16, 2006) with a constant model and an error threshold set to e = 1°C. 
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Using RM, the constant model allows to reduce the number of measurements transmitted to 43, 
resulting in about 85% of communication savings. 


7.3.1.2 Kalman Filter 


In Ref. [12], authors suggested to use Kalman filters as a generic approach to modeling sensor 
measurements for RM. A Kalman filter is a stochastic, recursive data filtering algorithm introduced 
in 1960 by Kalman [13]. It can be used to estimate the dynamic state of a system at a given time ¢ 
by using noisy measurements issued at time żọ : £— 1. 

The main goal of a Kalman filter is to provide good estimations of the internal state of a 
system by using a priori information on the dynamic of the system and on the noise affecting the 
measurements. The system model is represented in the form of the following equations 


x[ż] = Fx[ż — 1] + v, [4] (7.9) 
s[7] = Ax[t — 1] + yalt] (7.10) 


where x[£] is the internal state of the process monitored, which is unknown, and s[z] is the 
measurement obtained by a sensor at time instant £. The matrix F is the state transition matrix, 
which relates the system states between two consecutive time instants. The matrix H relates the 
system state to the observed measurement. Finally, v,[¢] and v,,[¢] are the process noise and 
measurement noise, respectively. 

The state of the system is estimated in two stages. First, a prediction/estimation stage is used 
to propagate the internal state of the system by means of Equation (7.10). Second, a correction 
stage fine-tunes the prediction step by incorporating the actual measurement in such a way that the 
error covariance matrix between the measurements and the predictions is minimized. Eventually, 
a prediction S[+] is obtained, expected to be closer to the true state of the system than the actual 
measurement s[£]. The details of the mathematical steps can be found in [12]. 


7.3.1.3 Autoregressive Models 


In [37], authors suggest the use of autoregressive models, a well-known family of models in time 
series prediction [2,4]. An autoregressive model is a function of the form 


SE] = Orsi — 1] + Oas[1 — 2] +... + Bpslz — pl (7.11) 


that aims at predicting the measurement at time ¢ by means of a linear combination of the 
measurements collected at the previous p time instants. The vector of parameters has p elements 
9 = (01, 02,..., Gy)” s and p is called the order of the model. Using the notations x[t] = 
(s[¢ — 1], s[ż — 2], ...,sl¢— p]) T to denote the vector of inputs of the model at time instant k, the 
relationship can be written as 


s[e] = x[11%0 (732) 


The estimation of the vector of parameters 0 is obtained by first collecting M measurements, and 
by applying the standard procedure of regression 8 [9] on the sensor node. The set of parameters 
is then communicated to the sink, and used until the prediction error becomes higher than the 
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user predefined threshold €. A new model is then sent to the base station. Authors in [37] also 
considered the fact that a measurement not correctly predicted by the model may be an outlier, 
and suggested to use statistical tests to determine whether or not to send an update. 


7.3.1.4 Least Mean Square Filter 


In Ref. [32], authors argue that the adaptive filter theory [1] offers an alternative solution for 
performing predictions, without requiring a priori information about the statistical properties of 
the phenomenon of interest. Among the variety of techniques, they chose the LMS algorithm, 
arguing that it is known to provide good performances in a wide spectrum of applications. As in 
Equation (7.12), the LMS is an autoregressive filter, where the output is a linear combination of 
previous measurements s[t] = x[+] T@[z]. In contrast to the proposed approach by [37], the vector 
of parameters 0[£] is updated over time as new measurements are taken. The LMS theory gives the 
following set of three equations for updating the parameters: 


Siz] = xf] T OL] 
elt] = s[+] —S[z] 
Olz + 1] = Of¢] + ple] elt] 


where e[ż] is the error made by the filter at epoch ¢ and y is a parameter called the step size, which 
regulates the speed of convergence of the filter. The choice of the order p of the filter and of the 
step size 4 are the only parameters that must be defined a priori. Authors of [32] suggested on the 
basis of their experiments that orders from 4 to 10 provided good results. Concerning the choice 
for the parameter ui, they suggest to estimate it by setting u = 10-75, where E = + YE, Is? 
is the mean input power of the signal. 


7.3.2 Spatial Modeling 


Two different approaches were investigated to model spatial dependencies with RM. In [33], RM 
are used on each edge of a routing tree that connects the nodes to the base station (edge monitoring). 
In Ref. [5], the network is partitioned in groups of nodes, with one model for each group (clique 
models). 


7.3.2.1 Edge Monitoring 


In this approach [33], it is assumed that nodes are connected to the base station by means of a 
routing tree. As with temporal modeling, there is one model 4; for each sensor 7. Additionally, 
there is also one model for each edge connecting neighboring nodes in the routing tree. More 
specifically, assuming that there is an edge connecting node / to j, a model /;_; is also produced by 
node j to represent the difference in value between the measurement s;[+] and the approximation 
5;[t]. Denoting this difference by Aj-ilt] = sr] — s;[z], the model has the form 


Aj- [+] = hy_i(x, 9) 


where x is a vector of inputs that consists, for example, of the past difference between measurements 
of sensor i and j, and 8 is a vector of parameters. The most simple model is the constant model, 
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Figure 7.6 Replicated models with edge monitoring: Models h; are used to infer sensor mea- 
surements, while models h;_; are used to monitor the differences between the measurements of 
two adjacent nodes. 


which was the only one considered in [33], which is defined by 
Â = Ale — 1] 


Note that more complex models such as autoregressive models could also be used. 

As in temporal modeling, each sensor node i has its own model h;, which is updated when the 
prediction is above the user-defined error threshold e. The copy of the model is, however, not 
maintained by the base station, but by the parent node in the routing tree. Each node j that has 
children maintains a model /;—; for each of its children ¿. A copy of these models is maintained 
by the base station. A second user-defined error threshold € is used to determine when to update 
these models. The way models are distributed in a network is illustrated in Figure 7.6 for a network 
of four nodes and a routing tree of depth three. Models produced by nodes are listed above each 
node, and models used to get predictions are listed below each node. 

The measurement of a node i is obtained by the base station by summing the prediction for 
the root node measurement, and the predictions for the differences between all pairs of nodes that 
separate node 7 from the base station. For example, in the network illustrated in Figure 7.6, a 
prediction for sensor 4 is obtained by summing 


SA =5 10 + Ay_ale] + Âl] 


Since A;-¡[+] = sj[¢] — 5;[+] and that the RM scheme ensures |5;[+] — s;[¢]] < e, [Aj-:14] = 
Aj-ilt]| < ea, it follows that 


m A prediction for the root node can be obtained with an accuracy te. 
m A prediction for a node / hops away from the root node and can be obtained with an accuracy 
+(e + l(e + €a)). 


The accuracy therefore depends on the depth ofa node in the routing tree, and lower € values must 
therefore be used to achieve the same accuracy as in temporal modeling. 
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Figure 7.7 Clique models: The network here is partitioned into two cliques Sı = (4, 5, 6} and 
S2 = (1,2, 3), with replicated models h and hz for each clique. The clique root of Sy is sensor 
node 3 and the clique root of S2 is sensor node 1. 


7.3.2.2 Clique Models 


The rationale of clique models, investigated in [5], is to partition the network into groups of 
sensors, called cliques, with one RM for each clique. Note that the term “clique” here refers to 
group of nodes and is not related to the notion of clique in graph theory. The measurements of 
sensors in a clique are gathered at a common sensor node called clique root. The clique root may 
not be part of the clique. Let {S,};<,<x be a partitioning of S in K cliques, that is, a set of K 
cliques such that S = Uj <gcx Sp. Let i” € Sp be the sensor node that takes the role of the root 
of clique &. 

The measurements of the set of sensors i € S4 are gathered at the clique root ¿;””, where data 
are modeled by a model 4 (x, 0). Inputs x may take values in the set of measurements of the sensors 
in the clique. The clique root then transmits to the sink the minimal subset of parameters/data 
such that the measurements s;[¢] and model counterpart 5,[£] do not differ by more than €. An 
illustration of a clique partition is given in Figure 7.7. Note that updating the model may require a 
clique root to rely on multi-hop routing. For example, sensor node 3 may not be in communication 
range of the base station and may require some nodes of $) to relay its data. 

The choice of the model and the partitioning of cliques are clearly central pieces of the system. 
Inspired by the work of [6] concerning model-driven data acquisition, authors in [5] suggest to 
use Gaussian models. The clique root collects data from all the sensors in the clique for N epochs, 
compute the mean vector and covariance matrix, and communicate these data to the base station. 
Then, at every epoch, the clique root determines what measurements must be sent so that the base 
station can infer the missing measurements with a user-defined € error threshold. 

The goal of the approach is to reduce the communication costs. These are divided into intra- 
source and source-base station costs. The former is the cost incurred in the process of collecting 
data by the clique root to check if the predictions are correct. The latter is the cost incurred while 
sending the set of measurements to the sink. Authors show that the problem of finding optimal 
cliques is NP-hard, and propose the use of a centralized greedy algorithm to solve the problem. The 
heuristic starts by considering a set of S cliques, that is, one for each sensor, and then assesses the 
reduction of communication obtained by fusing all combinations of cliques. The algorithm stops 
once fusion leads to higher communication costs. 
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7.3.3 Discussion 


RM have a high potential in reducing communications, and their main advantage is to guarantee 
bounded approximation errors. Temporal modeling is easy to implement, and the different pro- 
posed approaches do not require much computation, which makes them suitable for the limited 
computational resources of sensor nodes. In terms of communications savings, the network load 
of sensors is reduced by a factor proportional to the number of updates required to maintain syn- 
chronized models between the sensors and the base station. The highest network load is sustained 
by the base station and depends on the number of updates sent during an epoch. If all the sensor 
measurements can be predicted, then no update is sent and the load is therefore null for all sensors. 
At the other extreme, if all sensor nodes send an update, the load distribution is similar to that of 
collecting all the measurements. Therefore, RM give 


The modeling of spatial dependencies is attractive as spatial correlations are very often observed in 
sensor network data. Also, it is likely that all temporal models have to update their parameters at 
about the same time, that is, when an unexpected event occurs in the environment, for example. The 
modeling of spatial dependencies, however, raises a number of concerns. In the edge-monitoring 
approach, the error tolerance € must be reduced in order to provide the same accuracy guarantee 
as in temporal modeling approaches [33]. In clique models, the partitioning of the network is a 
computationally intensive task that, even if undertaken by the base station, raises scalability issues 
[5]. In terms of communication savings, the network load of sensors is reduced, as for temporal 
modeling, by a factor that depends on the average number of updates. The savings can, however, be 
much higher, particularly, in situations where all the measurements increase by the same amount, 
for example. 

An issue common to all the approaches based on RM is packet losses. In the absence of 
notification from a sensor node, the base station deems the prediction given by the shared model 
to fall within the € error tolerance. Additional checking procedures must, therefore, be considered 
for this scheme to be completely reliable. To that end, a “watchdog” regularly checking the sensor 
activity and the number of sent packets can be set up, as discussed in [32], for example. By keeping 
the recent history of sent updates in the sensor node memory, these can be communicated to 
the sink at checking intervals if the number of sent packets differ from the number of received 
packets. Node failure is detected by the absence of acknowledgment from the sensor node to the 
watchdog request. Finally, the choice of the model is also an important factor in the efficiency of 
RM. Techniques based on model selection can be used to tackle this problem [15,18]. 


7.4 Aggregative Approaches 


Aggregative approaches are based on aggregation services, which allow to aggregate sensor data in a 
time- and energy-efficient manner. A well-known example of aggregation service is the TAG system, 
developed at the University of Berkeley, California [24,25]. TAG stands for Tiny AGgregation and 
is an aggregation service for sensor networks that has been implemented in TinyOS, an operating 
system with a low memory footprint specifically designed for wireless sensors [20]. In TAG, an 
epoch is divided into time slots so that sensors’ activities are synchronized according to their depth 
in the routing tree. Any algorithm can be relied on to create the routing tree, as long as it allows 
data to flow in both directions of the tree and does not send duplicates [24]. 
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Figure 7.8 Multi-hop routing along a routing tree, and node synchronization for an efficient 
use of energy resources. (a) Routing tree of depth three. (b) Activities carried out by sensors 
depending on their level in the routing tree. (Adapted from Madden, S. et al., TAG: a tiny 
aggregation service for ad-hoc sensor networks. In Proceedings of the 5th ACM Symposium on 
Operating Design and Implementation (OSDI), Vol. 36, pp. 131-146. ACM Press, 2002. With 
permission.) 


The TAG service focuses on low-rate data collection task, which permit loose synchronization 
of the sensor nodes. The overhead implied by the synchronization is therefore assumed to be low. 
The goal of synchronization is to minimize the amount of time spent by sensors in powering their 
different components and to maximize the time spent in the idle mode, in which all electronic 
components are off except the clock. Since the energy consumption is several orders of magnitude 
lower in the idle mode than when the CPU or the radio is active, synchronization significantly 
extends the wireless sensors’ lifetime. An illustration of the sensors’ activities during an epoch is 
given in Figure 7.8 for a network of four nodes with a routing tree of depth three. Note that the 
synchronization is maintained at the transport layer of the network stack and does not require 
precise synchronization constraints. Aggregation services such as TAG allow to reduce energy 
consumption both by carefully scheduling sensor node’s activity and by allowing the data to be 
aggregated as they are routed to the base station. 

Using the terminology of [24,25], an aggregate of data is called a partial state record and is 
denoted by (.). It can be any data structure, such as a scalar, a vector, or a matrix, for example. 
Partial state records are initialized locally on all nodes, and then communicated and merged in the 
network. When the partial state record is eventually delivered by the root node to the base station, 
its elements may be recombined in order to provide the observer with the final output. Methods 
based on aggregation require the definition of three primitives [24,25]: 


m An initializer ¿nit that creates a partial state record 
m An aggregation operator f, that merges partial state records 
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m An evaluator e that returns, on the basis of the partial state record finally delivered to the 
base station, the result required by the application 


The partial state records are merged from the leaf nodes to the root, along a synchronized routing 
tree such as in Figure 7.8 

In practice: Aggregation services were first shown [24,25] to be able to compute simple operations 
like the minimum, the maximum, the sum, or the average of a set of measurements. For example, 
for computing the sum, the following primitives can be used: 


E (si[t]) 
FUSDASIN) = a + $2) 
AS = 


Measurements are simply added as they are forwarded along the routing tree. The resulting 
aggregate obtained at the base station is the overall sum of the measurements. The main advantage 
of aggregation is that the result of an operator is computed without collecting all the measurements 
at the base station. This can considerably reduce the amount of communication. 

In data modeling, aggregation services can be used to compute the parameters of models. For 
example, let us consider a sensor network monitoring the temperature in a room where an air 
conditioning system is set at 20°C. Most of the time, the temperature measurements of sensor 
nodes are similar, and can be approximated by their average measurements. The average model 
[5,10,24] was one of the first proposed, as its implementation is fairly straightforward. The interest 
of such a model can be to detect, for example, when a window or a door is opened. To do so, the 
average of the measurements is first computed by means of an aggregation service and retrieved 
at the base station. The result is then transmitted back to the sensor nodes, which can compare 
their measurements with average measurement. If the difference is higher than some user-defined 
threshold, a sensor node can notify the base station that its local measurement is not in agreement 
with the average measurement. 

In [8], authors showed that the coefficient of a linear model can be computed using aggregation 
services. Their approach, called distributed regression, allows to use more complex models for 
representing sensor measurements as a function of spatial coordinates. The average model is 
first presented in the following, and is followed by the presentation of the distributed regression 
algorithm. 


7.4.1 Average Model 


The average model is a simple model that well illustrates the rationale of aggregation. It consists in 
modeling all the sensor measurements by their average value, that is, 


Sit] = plz] 


where p[z¢] = L; Ee sl l iş the average of all the sensor measurements at epoch £ [5,10,24]. The 


average is obtained by the following primitives: 


init(si[t]) = (1, silzl) 
F(C1, S1), (C2, S2)) = (C1 + C2, S1 + $2) 
e((C,S)) = 4 
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Figure 7.9 Aggregation service at work for computing the average uft] = Ln of the 
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measurements taken by sensors at epoch t. 


The partial states record (C, S) is a vector of two elements consisting of the count and the sum 
of sensor measurements. The aggregation process is illustrated in Figure 7.9. This way, only two 
pieces of data are transmitted by all sensor nodes. The main advantage of aggregation is that all 
nodes send exactly the same amount of data, which is independent of the number of sensors. It, 
therefore, provides a scalable way to extract information from a network. Also, it may dramatically 
decrease the highest network load, typically sustained by the root node. In the network of nine 
sensors represented in Figure 7.9, the transmission of all measurements to the base station would 
cause the root node to send nine measurements at every epoch. This number is reduced to two 
thanks to the aggregation process. 

Once obtained at the base station, aggregates can be transmitted from the base station to sensor 
nodes [8]. This allows sensor nodes to compare their measurement to the average measurement, 
and to provide the base station with their measurement if 


Isi/]— ul > € 


where € is a user-defined error threshold. 

This is illustrated in Figure 7.10, where nodes 1 and 8 actually send their true measurement after 
receiving the feedback u[ż] from the base station. Such a strategy allows to bound the approximation 
errors of the average model. It, however, implies additional communication rounds between the 
base station and the sensor nodes, which are intuitively expensive in terms of communication. 


7.4.2 Distributed Regression 


Similar to the average model, an aggregative approach can be used to compute the regression 
coefficients of a linear model. The approach was investigated by Guestrin et al. in [8], who 
relied on basis functions to approximate sensor network measurements [9]. The basis functions 
may be defined over space and time, allowing in some cases to compactly represent the overall 
spatiotemporal variations by a small set of coefficients. 


Prediction-Based Data Collection in Wireless Sensor Networks m 173 


Base 
station 
ple] 
ult] ult] 
O; (5) € 
ule] ult] ult] 
Y y Y 
(2) Q (2) 
ule] ult] ult] 
© Ò © 
Ile) - sell > e? 
Base 
station 


sd, Slt] 


S 
©) 


A A 
sell 
© © 
A A 
si[£] 


Figure 7.10 The aggregate u[t] can be communicated to all sensors allowing them to compute 
locally |s;[t] — ult]| and to send their true measurement s;[t] if the difference |s;[t] — ult]! is 
higher than a user-defined error threshold e. In this example, sensors 1 and 8 update their 
measurements. 


More precisely, let H = (7t,,...,7tp) be a set of p basis functions which are used to represent 
the sensor measurements. This set must be defined by the observer, prior to running the algorithm. 
The inputs of these basis functions can be functions of the time £ or of the sensor coordinates 
c = (¢1,¢2,¢3) (assuming 3D coordinates), for example. Let p be the overall number of basis 
functions and let 0; be the coefficient of the jth basis function. The approximation to a sensor 
measurement at time ¢ and location c is given by 
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S(c,t) = yo Tt; (c, t) 


Example 7.3: 
A quadratic regression model over time S(c, = 01t + Qt? is defined by two basis functions 


T1 (C, t) = and m2 (c, t) = E. The addition of T3 (C, t) = cy and 714(c, t) = cz gives a model that 
captures correlations over space. An intercept can be added with 7t5(c, t) = 1. 


Using the notations defined earlier, we have a model / that represents the overall variations in the 
sensor field by 


ho: R>R 


P 
xh Sc, t) = y 9;7;(c, £) 


j=l 
where the inputs x = (711 (c, £), ..., TIp(c, £)) are functions of time and coordinates, and the param- 
eters 0 = (01,...,0,) are the coefficients of the linear model. Approximations 5;[t] for sensor i are 


given by specifying the coordinates c; of sensor in the model, that is, 5;[¢] = pe O;71;(c;, t). An 
interesting feature of this model is that it not only provides approximations to sensor measurements 
but also allows to provide predictions for all locations in the field. 

Assuming that N; measurements s;[ż] have been taken at locations c; for N; epochs £, let 
N= > N; be the overall number of measurements taken by sensor nodes. The coefficients 
9; can be identified by minimizing the mean squared error between the actual and approximated 
measurements. 

For this, let Y be the V x 1 matrix that contains these measurements, and let O be the column 
vector of length p that contains the coefficients 9;. Finally, let X be the N x p matrix whose 
columns contain the values of the basis functions for each observation in Y. Using this notation, 
the optimization problem can be stated as 


9* = arg min ||X0 — Y||* 
e 


which is the standard optimization problem in regression [9]. The optimal coefficients are found 
by setting the gradient of this quadratic objective function to zero, which implies 


axe = xT Y (7.13) 


Let A = XTX and b = XTY. A is referred to as the scalar product matrix and / as the 
projected measurement vector. In distributed regression, the measurements s;[£] do not need to be 
transmitted to the base station. Instead, the matrix A and vector b are computed by the aggregation 
service. Once aggregated, the coefficients 8 can be computed at the base station by solving 
Equation 7.13. 

Let X; be the N; x p matrix containing the values of the basis functions for sensor i, and let Y; 
be the NV; x 1 matrix that contains the measurements taken by sensor 7. Both X; and Y; are available 
at sensor 7, which can therefore compute locally 4; = XX, and 6; = XI Fi 
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The matrix A and the vector 6 are actually sums of A; = bed X; and 6; = XTY, Using this 
fact, A and b can be computed by an aggregation service by merging along the routing tree the 
contributions A; and 6; of each sensor. 

Indeed, assuming S sensors, each of which collects a measurement s;[¢], we have 


S 
aj = Y mana (cis) (7.14) 


i=1 


where 4;; is the entry at the jth row, j th column of the scalar product matrix A. Similarly, we have 


S 
b; = Yo Tlen sle (7.15) 


i=1 


where 0; is the jth element of the projected measurement vector. All the elements 4;; and b; can 
be computed by means of an aggregation service, using the following primitives: 


m For elements ajj: 
init(i) = (TG (c; t)TG (ci t)) 
fUS1), (S2)) = (S1 + $2) 
e((S)) =S 


m For elements 0;: 


init(i) = (Tu (c; t)si[t]) 
F(US1), (S2)) = (S1 + $2) 
e((S)) =S 


The computation of the matrix A requires the aggregation of p? elements, while the aggregation 
of the vector 6 requires the aggregation of p elements. Once all the elements are retrieved at the 
base station, the set of coefficients O can be computed by solving the system 


A0 = b 


The resulting model allows to get approximations to sensor measurements. Depending on the 
model, approximations may also be obtained for other spatial coordinates or at future time instants 
(cf. example 3). As with the average model, the parameters O may be communicated to all nodes, 
allowing each sensor to locally compute the approximation obtained at the base station. This makes 
it possible to check that approximations are within an error threshold € defined by the observer. 
All sensors whose approximations differ by more than +e may notify their true measurements to 
the base station. 


7.4.3 Discussion 


The main advantage of aggregative approaches is that they allow to represent the variations of sensor 
measurements by means of models whose number of coefficients is independent of the number of 
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sensor nodes in the network. This makes these approaches scalable to large networks. Furthermore, 
they allow to evenly distribute the number of radio transmissions among sensor nodes. Compared 
to model-driven and RM approaches, the network load of leaf nodes is increased, whereas the load 
of nodes close to the base station is reduced. Considering that the highest network load primarily 
determines the network lifetime, aggregative approaches are particularly attractive for reducing the 
load of sensor nodes close to the base station. 

The highest network load depends on whether bounded approximation errors are required. 
Retrieving the model coefficients causes a highest network load of 


Imax ~ OQ’) 


where p is the number of parameters of the model. If bounded approximation errors are required, 
the p coefficients must be communicated to all nodes, causing p transmissions. Depending on the 
number of sensors for which approximations are more than € away from their true measurements, 
an additional number of updates of up to S may be sent. Denoting by LE% the highest network 
load when approximations are checked against the true measurements, we have 


LOK ~ O(p? + S) 


The upper bound is higher for data collection where all measurements are collected, and for which 
we had Lmax = O(S) (cf. Section 7.1). The distributed regression with bounded errors, therefore, 
may lead to higher communication costs if the model does not properly reflect the variations of 
sensor measurements. 

The choice of the model is thus an important issue, particularly when there is no a priori 
information on the type of variations. In practice, a solution may be to collect data from the whole 
network in order to get an overview of the types of measurement patterns. On the basis of this 
initial stage, different models may be tried and assessed at the base station, in order to select a model 
that properly fits the data. In this respect, it is worth mentioning that aggregative approaches can 
also be applied for dimensionality reduction purposes, using the principal component analysis [17] 
and the compressed sensing frameworks [23,30,38]. 

Finally, it is noteworthy that different optimizations can be brought to these approaches. In 
particular, the elements of the matrix A do not depend on sensor measurements. In the case of 
spatial models, they only depend on the spatial coordinates of the sensors. If these coordinates are 
known by the base station, the matrix A may be computed straightaway at the base station, thus 
saving 0(p*) transmissions. In the same way, the base station may also infer the entries of A when 
time is involved. The load can therefore be reduced to O(p) if no error threshold is set. 


7.5 Conclusion 


This chapter provided a state of the art on the use of learning techniques for reducing the amount 
of communication in sensor networks. Classifying these approaches in three main types, namely 
model driven, RM, and aggregative approaches, we outlined for each of them their strengths and 
their limits. Table 7.1 gives a summary of the different learning schemes in terms of error type and 
highest network load. 


m Approaches based on model-driven acquisition reduce the highest network load to |Sy], 
that is., the number of sensors whose measurements are effectively collected. The main 
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characteristic of these approaches is that part of the network can remain in the idle mode. 
Model-driven data-acquisition, therefore, not only reduces the highest network load, but 
also allows to reduce to a negligible level the energy consumption of the sensor nodes not 
queried. In the idle mode, the energy consumption is about four orders of magnitude lower 
than in the active mode (e.g., 5 4A in the idle mode against 19.5 mA with MCU active for 
the Telos node [28]). 

The subset of sensor nodes whose measurements are collected can be changed over time 
to distribute the energy consumption. Indeed, there exists in most cases different pairs of 
set of queried and predicted sensors for which the observer's accuracy requirements can be 
satisfied [16]. 

The error type entailed in the obtained predictions depends on whether the model can 

be trusted. If the model is correct, predictions can be bounded with an error threshold 
€ and a confidence level 1 — 5. However, the drawback of model-driven approaches is 
that unexpected events in the monitored phenomenon may not be detected if they concern 
locations where measurements are predicted. The use of multiple pairs of sets of queried 
and predicted sensors can be used to address this issue. Indeed, assuming that all sensor 
nodes will at some point be part of the set of queried sensors, an unexpected event will be 
detected when the sensor nodes monitoring the location of the event are queried. The time 
elapsed before the event is detected depends on the frequency at which subsets of sensors are 
changed. This time may be long, and therefore model-driven approaches are not well-suited 
to event detection tasks. 
The main characteristic of RM is to guarantee €-bounded prediction errors, even in the case 
of unexpected events. These approaches, however, require the sensor nodes to take their 
measurements at every epoch, so that they can be compared with the predicted measurements. 
The energy savings depend on the frequency of updates of the model. In the optimal 
case, the predictions obtained by the model are always within +e of the true measurements, 
and therefore no communication is needed. The energy savings are in this case around one 
order of magnitude (1.8 mA in the idle mode against 19.5 mA with MCU active for the Telos 
node [28]). 

In the worst case, updates are needed at every epoch. The Highest Network Load 
(HNL - cf. Section 7.1.2) is in O(S) for this worst scenario, and has therefore the same 
order of magnitude as the default data collection scheme. It is therefore worth noting that 
depending on the number of parameters sent in each model update, the exact amount of 
communication can even be higher with RM than for the default data collection scheme. 


Table 7.1 Comparison of Performances of the Different Modeling 
Approaches. 


Learning Scheme Error Type Highest Network Load 
Model-driven acquisition | Probabilistic bounded IMD ~ O(ISgl) 

or unbounded 
Replicated models e-bounded LÈM = O(S) 
Aggregative approaches | Unbounded or EDR ~ O(p?) 


e-bounded LP Recheck ~ Op? +5) 
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a Finally, ageregative approaches lead to either unbounded or €-bounded error. The type of 


error depends on whether aggregates obtained at the base station are communicated to sensor 
nodes. If no check is made against the true measurement, the highest network load is reduced 
to the number of aggregates collected, that is, O(p?), where p is the number of parameters 
in a model, and optimization techniques can be used to reduce the HNL in O(p). The main 
characteristic of aggregative approaches with unbounded errors is that the HNL does not 
depend on the number of sensors. This makes these approaches scalable to large networks. 
The model coefficients computed at the base station can be communicated to sensor 
nodes. This allows sensor nodes to compute locally the predictions obtained at the base 
station, enabling aggregative approaches to deal with event detection. The use of this feature 
can, however, be expensive in terms of communications, as an additional network load in 


O(p + S) can be reached in the worst case. 


In summary, modeling techniques can greatly reduce energy consumption, up to several orders 
of magnitude with model-driven approaches. The use of models, however, implies some approxi- 
mations of the sensor measurements. For an observer, it is often important that the approximation 
errors are bounded, that is, within +e of the true measurements. This guarantee is only possible 
with RM and aggregative approaches, which allow to compare the model predictions with the true 


measurements. 
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8.1 Introduction 


Based on World Health Organization (WHO) report [1], over 1 billion people worldwide suffer 
from neuro-disorder diseases (NDDs), ranging from epilepsy to Parkinson's disease. Every year 
over 7 million people die as a result of NDDs [1]. In the United States alone, NDDs cost 
over $148 billion per year, and the annual cost of care for each NDD patient is over $64,000 per 
year [2]. Before determining the medication or surgery treatments for many different types of NDDs 
(>50 [3]), a neurology doctor needs to analyze the patient's concrete NDD symptoms, which 
include abnormal gaits (in daytime walking) or motor disorders (mostly occurring in nighttime 
[4-6]). 

Despite NDDs’ extremely high medical cost, up to this point we still rely on labor-intensive 
observations to determine neuro-disorder symptoms. For example, in the initial observation phase 
of epilepsy, a patient often needs to stay in the hospital for at least a few days (each day of in-hospital 
cost is around $1500 today in the U.S. hospitals [7]). Today a NDD physician either asks the 
patient to orally report the symptoms, or uses recorded video to search each symptom [4]. The oral 
report is unreliable since the patient cannot clearly recall what occurred. The video-based analysis 
still costs significant time and effort from a NDD physician. A programmable, PTZ-capable camera 
could cost more than $1000 each. Even a low-resolution video sensor could cost >$300 each. 

Therefore, it is critical to design a gait recognition system for accurate capture of NDD symptoms. 
Such an automatic gait monitoring system has to be low-cost, and uses highly motion-sensitive 
sensors and accurate gait pattern recognition algorithms. Pyroelectric sensors have many advantages: 
they are inexpensive, only $3 each. They are very small, only the size of semiconductor tubes. 
A pyroelectric sensor generates very small sensing data (a few bits for each event). No complex 
calculation is needed. Moreover, the low-cost sensor can detect human’s radiation (8—14 um) [8] 
with a high sensitivity. They can also accurately capture angular velocities of a thermal source 
(0.1-3 rad/s). 

We have successfully built an Intelligent Compressive Multi-Walker Recognition and Tracking 
(GSMART)[9-14]. The neighboring sensors combine their readings to achieve a global view of 
the walkers’ paths through the recognition and trace of their gaits. It is challenging to dis- 
tinguish among different people’s gait sensing signals when their signals are mixed together. 
Binary signal separation algorithms are needed to associate binary sequences to different individ- 
uals. iSMART control needs the knowledge from multiple disciplines: machine learning, pattern 
recognition, signal processing, computer networking, embedded systems (sensors), and computer 
programming [11]. 

Figure 8.1 shows the basic principle of SMART system. A PSN uses multi-hop wireless commu- 
nications to send data to a base station (for centralized data processing) or to perform localized data 
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Figure 8.1 (a) PSN for tracking (b) Pyroelectric sensor with Fresnel lens (c) Fresnel lens 
(d) Binary data. 
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processing without sending to base station (called distributed processing). When a patient passes 
through the sensor network, the system is able to recognize and track the patient (Figure 8.1a). 
To generate rich visibility modulation from human thermal sources, we put a Fresnel lens before 
the pyroelectric sensor (see Figure 8.1b). The lens has different hole patterns that filter thermal 
signals in certain mode. We have tried dozens of different lens patterns, and found the pattern 
shown in Figure 8.1c is especially sensitive to different gaits, that is, different patients generate more 
discriminative sensor data. The lens is made of plastic materials and could be very inexpensive after 
batch production. Note that Figure 8.1c shows that we use one lens to perform signal modulation 
for multiple sensors simultaneously. Figure 8.1d is the observation data. The shaded part means “1” 
and blanked part means “0.” That is, the observed data is binary format (later on we will explain 
how we get such binary data). Note that Figure 8.1d shows a matrix of binary data. This is because 
we use each row to represent the data from the same sensor. Each column means data in the same 
time (from different sensors). 

On “binary” observation data: Although all computer data are in binary format, here “binary 
observation data” do not refer to computer digital representation. Instead, we mean that a patient is 
“detected” (use “1” to represent it) or “not detected” (use “0”) in a certain time from the perspective 
of a sensor. The binary generation procedure is shown in Figure 8.2. In Figure 8.2a, we can see 
that when an analog signal from pyroelectric sensor passes through a Sine filter, we get most energy 
for that signal. Then compare its amplitude to a threshold, if it is higher than the threshold, we 
interpret it as “1”; otherwise, it is “0.” Figure 8.2b shows the signal waveform change from original 
oscillating signals to interpreted binary data. 

In this chapter, we will report our recent research results on a challenging issue in iSMART: 
context feature extraction for crowded scene patient gait identification. Context information can help 
us quickly locate an object. For instance, when looking for specific objects in complex and cluttered 
scenes, human and other intelligent mammals use visual context information to facilitate the search, 
by directing their attention to context-relevant regions (e.g., searching for cars in the street, looking 
for a plate on the table). 

In Figure 8.3, the input to the context filter is the binary observations from multiple sensors. 
We divide the stream into different observation windows. Each window is a matrix. We then 
process such a matrix via context filter, which runs signal projection algorithms (such as principal 
component analysis) to extract dominant features. Based on these features, we then determine 
whether or not this window of data has Rol. Such a Rol could mean any scenarios such as A and B 
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Figure 8.2 Generate binary observation data. (a) Binary data generation via threshold 
comparison. (b) Waveform (from raw signals to binary). 
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Figure 8.3 Functional diagram of context awareness. 


are passing by. As shown in Figure 8.3, besides passing through the context filter, the binary data 
will go through Bayesian-based patient tracking and recognition algorithms. By combining with 
the identified scenarios (i.e., Rols), we will be able to obtain more accurate information such as 
what person is passing through what location. 

The rest of this chapter will be organized as follows. Section 8.2 provides an overview of related 
work. The big picture of gait context awareness is stated in Section 8.3. Next, in the main body of 
this paper (Section 8.4), we elaborate the use of NMF to extract patient gait context. Sections 8.5 
and 8.6 briefly introduce some basic algorithms to be used in performance comparisons between 
different gait context extraction methods. Section 8.7 details our experiment results. Section 8.8 
concludes this paper and some future works are mentioned. 


8.2 Related Works 


The work close to ours is gait context awareness in video-based gait recognition and human tracking 
system. For instance, in Refs. [15,16] the concept of region of interest (Rol) has been defined to 
refer to a target or special scene to be searched in a complex, large-scale picture pixels. They utilize 
humans” Rol capture behaviors: typically humans look at a scene in a top-down approach. That 
is, we first take a glance at the entire scene without carefully looking at each detail. If we find 
an interesting profile, we then look at details to ensure this is what we are looking for (i.e., Rol). 
In Refs. [17,18] the context concept is enhanced by the definition of saliency, which means how 
different a local image profile is different from background image. Bayesian framework is used to 
deduce the saliency and context values. 

This top-down context capture approach [17—19] can also be used for our case. When we receive 
a window of pyroelectric sensor data, we may use fast statistical methods (such as energy functions) 
to see whether this window has different statistics. Ifso, more accurate context extraction algorithms 
(such as signal projection methods) can be used to extract the signal bases and the corresponding 
basis weights (i.e., coefficients). However, because the binary pyroelectric data does not have 
visual interpretation (except some 0, 1 binary values), the saliency-based context extraction used in 
traditional video systems cannot be used to capture human gait features. Therefore, we propose to 
use binary-oriented NMF and PCA algorithms to extract the most dominant sensing signal features. 


8.3 Principle of Gait Context Extraction 


Our ¡SMART system uses a gait sensor cluster architecture as shown in Figure 8.4. Instead of evenly 
distributing sensors everywhere, we deploy sensors into “clusters.” This can fully utilize the sensitive, 
wide-angle thermal detection capability of pyroelectric sensors and thus reduce repeated sensor 
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Figure 8.4 Multicluster gait sensing. 


measurements. Moreover, since a microcontroller has multiple ADC (analog-to-digital converter) 
interfaces, by grouping multiple sensors in one cluster, we can use one RE communication board to 
send out multiple sensors’ data, which reduces the hardware cost. Through careful control of each 
sensor's facing direction, we could well capture a 360° view of a neighborhood around a cluster. 

Assume a cluster has NV sensors. For such WV-dimension data stream, we will use the principle 
shown in Figure 8.5 to identify a new gait context (called context extraction). As shown in Figure 8.5, 
for such a N-dimension data, first we need to segment it into different windows. The window size 
depends on how much data a sensor can handle in real-time. Here we use an 8 x 16 window size 
to form a binary matrix, called observation data X. Each value is either 1 (means “detected”) or 0 
(means “not detected”). The context extraction system includes two phases: 


1. Training phase: It is important to identify some common aspects to be compared between 
different scenarios. For instance, in traditional video systems, to identify human faces, we 
typically use eye size, nose length, distance between eyes, etc. to serve as comparison “bases.” 
Likewise, we need to identify some gait “bases” in our PSN, although each basis may not have 
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Figure 8.5 Gait context detection system. 
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physical interpretation of gaits as clear as human faces. Depending on which signal projection 
algorithm we use (such as PCA, NMF, etc.), we could obtain a set of bases for all pyroelectric 
sensor data to be trained. We select windows of binary data for almost all different contexts. 
For each of those contexts, we obtain the corresponding coefficients (called weights) for 
each basis. All contexts’ bases and weights are stored in context template database for testing 
purpose. 

2. Testing phase: When a new window of data comes, to extract the context information from 
this window, we project the data into the bases prestored into the context database and 
calculate the corresponding basis coefficients (weights). We then calculate the similarity level 
between the new calculated weights to the ones in the database. The closest match indicates 
a found “context.” In Figure 8.5, we use H to represent the context weights prestored in the 
database, and use H to represent new tested context weights. In order to visualize the context 
features, we utilize the linear principal regression (LPR) to project the multidimension vectors 
(H or H ) to a two-dimension space (Section 8.6 will have algorithms). 


8.4 Context Awareness Model: Parts-Based Approach 
8.4.1 Hidden Context Pattern for Binary Pyroelectric Sensing Data 


We format the gait context identification problem as the issue of identifying the hidden context 
patterns (HCPs) for a given observed sensing data (OSD), which is high-dimensional binary data. 
Particularly, we attempt to answer two questions: (1) Since HCPs (matrix H) describe context 
features of each patient walking scenario, it should be extracted from data X. As a matter of fact, 
we can regard HCPs as the intrinsic sensing characteristics of each cluster of sensors (Figure 8.4). 
Then how do we identify the real-valued HCPs (Matrix H) from each binary-valued OSD X? 
(2) Each OSD can be seen as the mixture of different HCPs (Figure 8.6). Then how do we 
determine the mixture coefficients (also called “weights”) that form matrix W? 

To answer the aforementioned two questions, we first model the individual gait sensor value 
(either 0 or 1) as a Poisson-distributed random variable. Suppose there are N sensors in each 
cluster. Also assume each observation window has M times of measurement. Thus OSD is a N x M 


HCPs (H) Weights (W) 


Bit number 


Bit number 


Detection 
probability 


Figure 8.6 Gait context pattern models. 
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Figure 8.7 Interpretation of OSD and HCP. 


matrix (X). In each position (ż, j) of matrix X, the change of its value Xj can be seen as a 
Poisson-distributed random variable with parameter A, that is, 


Ye 1 with probability eAi (means “detected by sensors,” i.e., shaded part, Figure 8.1d ) 
7 |0 with probability 1 — e” (“not detected by sensors,” i.e., blank part, Figure 8.1d) 
(8.1) 


To reflect the nature of HCPs and weight in Figure 8.6, we define a M-dimension gait context 
pattern that is determined by cluster & as a row vector in context matrix 


H.-H, = (Ho... Hey) = 0, k= 1, 2, 3,...,K. 


Here we assume there are totally K clusters. Note that the number of clusters K is a small 
number. It is usually chosen such that (N + M) -K<N - M. The reason of doing this is to 
eventually express X in a compressed format, that is: X ~ WH (Figure 8.7). If we assume that cluster 
k is the only factor that causes a value “1” in X, Hy can then be seen as the probability (a real value) 
of the observation Xj = 1. In other words, Hy; expresses the probability that observation Xj is 
either 0 or 1. Since it is a probability, it could be a noninteger number between 0 and 1. Therefore, 
if cluster & is the only event that causes a “1” in X, Hg, < Hy could be interpreted as follows: it is 
more likely to detect the object in time 4 than time a from cluster & sensors viewpoint. 

Now, suppose each cluster Ky«(A=1,2,...,4) contributes the detection of an object with 
different weights, denoted as W;z, then the overall detection probability from those K clusters is 


K 
PX = Wa... Wik, Aij..., HR) = I] eo Wik Mey = -WH (8.2) 
k=1 


Likewise, the overall of missing (i.e., not detected) probability should be 


K 
PX; = 0|Wa,.... Wik, My,....Hg)=1—-[ [Po =1— e li (8.3) 
k=1 


For convenience of representation, we summarize the aforementioned conclusion as follows: 
For any given OSD Xj, and K clusters of pyroelectric sensors, we have the following detection 
and missing probability: 


: —— — ¿ (WH); 
| Detected: P(X; = 1|W,H) = e i (8.4) 


Missing: P(X; = 0[W,H) = 1 — eo WH 


188 m Intelligent Sensor Networks 


Cluster k= 1 detection probability (with weight W,): e WirHy 


Realization (binary): X, jE {0,1} 


0.6 a 
O O © e 
O O ) Cluster 3 detection probability; e-Wis: M3 
0.3 o2 >] 
, UY 
- ; dity: o Wa Ho 
00 Cluster k= 2 detection probability: e” %i2%2 (WH), 


Figure 8.8 Binary pattern detection. 


Note that although we could have noninteger numbers in weight matrix W and context matrix H, 
the observed data Xj is always a binary number (0 or 1). Figure 8.8 illustrates such a concept. 
Now our task is to seek for the context matrix H and weight matrix W. Their selection should 
maximize the likelihood of modeling the binary observed data matrix X. For a single observation 
Xj, since it is a binary value, a natural choice is to use Bernoulli distribution to describe this 
single-value (X;;,) likelihood as given in the following: 
PIW, H) = eH (1 — W) (8.5) 


Therefore, the overall likelihood for all vales in data matrix X should be (take “log”) (called Bernoulli 
Log-Likelihood): 


ij 
N M 
= > > (A — Xy) In(1 — exp(-[WH];)) — X;{WH],) (8.6) 


17 


Il 
a 


Se 


Obviously, we should seek non-negative matrixes W, H to maximize L, that is, 
Seek W, H to arg max(L(W, H)) subject to W, H > 0 


This approach could obtain the solution of W and H by using alternating least square (ALS) 
algorithm. That is, each element of matrix W and H could be solved as follows: 


aL OL 
New Old | New Old | 
Ay < Hy 4 Mis a Wa WA Ma (8.7) 


The important idea is to keep one matrix fixed while updating the other one. Due to the constraint 
of W, H > 0, the step size parameters Aw and Ay should be chosen carefully. However, ALS can 
only be guaranteed to find a local maximum for sufficiently small Aw and Aq. 


8.4.2 Probabilistic NMF Model 


Based on the aforementioned analysis, our task is to actually seek a matrix factorization solution 
for X= W H, where X is the data matrix, W is the weight matrix, and H is the context matrix. 
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However, we have the constraints that X is a binary matrix and W, H are non-negative matrix. 
This reminds us of the NME (Non-negative Matrix Factorization) [21]. Given a target's signal (¡.e., 
a binary matrix V), NME intends to find the basis matrix W and feature set matrix H as follows: 


V=WH, WeRr”””, He RrR™”*” (8.8) 


We then apply expectation-maximization (EM)-like algorithm to recursively update basis matrix 
and feature matrix as follows: 


new old old Vik 
Hak Hy <- H}; W; 


Wag E wa 2 a 
i (WH) ie 


(8.9) 
where i= 1,2,...,m,a=1,2,...,d, and k=1,2,...,7 
The NMF model mentioned earlier has a shortcoming: it cannot reflect the random nature of 


the error: E =X — WH. If we assume E is a Poisson distribution, we first write down general 
Poisson distribution as follows: 


sită —0 
f(X|9) = (8.10) 
where 
9 is Poisson distribution parameter (its original meaning is the event inter-arrival rate) 
X is the random variable 
For our case, if we assume that 
iid. 
X = WH +E, whereerror Ej ~ Poisson (9) (8.11) 
we can then determine the Poisson Likelihood as follows: 
i Xij e WHI 
Lpoisson(O) = P(X|W, H) = I] MAA (8.12) 
For calculation convenience, we take Natural Log as follows: 
X; 
[WH]; e YF li 
Lroison(®) = In(POIW, BD) = In| | || | 
bd 
= )_ (X; [WH]; — [WH]; — (Xy) (8.13) 
y 
We use Stirling’s formula [22]. 
InQXy!) + Xjla (Xy) — Xj (8.14) 


Then the aforementioned Log-Likelihood becomes 


Lpoisson(®) = In(P(X[W, H)) = — EE In yy + Hla - x) (8.15) 
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It is very interesting to see that this expression is actually the negative of Kullback—Leibler (KL) 
Divergence between variable X and WH, that is, 

Lpoisson (©) = In(P(X|W, H)) = -Dx (X, WH) (8.16) 
As we know, KL Divergence can measure the information entropy difference between two random 
variables with different probability distributions. The less the KL Divergence, the more similar two 
distributions are. From the aforementioned equation we can see that our goal is to maximize the 
Log-likelihood, i.e., to minimize the KL Divergence. 


Seek W, H to Minimize Dgr X, WH) 


We can now understand why many NME applications use the KL Divergence as the cost function: 


DrX, WH) = 2 2 (nz, + [WH], — xy) (8.17) 


Such a KL cost function can give us a better solution of W, H than conventional NMF cost function 
which does not lead to unique solutions. 


1 
Dum X, WH) => >) Xj — [WH]; (8.18) 
i j 


As a matter of fact, the aforementioned cost function comes from the assumption that the error 
E =X — WH obeys Normal (Gaussian) distribution, that is 


iid. 
X = WH +E, whereerror Ejj ~ Gaussian (0, 0) (8.19) 


Thus we obtain the Gaussian Likelihood as follows: 


1 (ee 72) 
LGaussian(O) = P(X[W, H) = (=) I] I (8.20) 


If we take Natural Log, we have the following: 


LGaussian (©) = In(PXIW, H)) = —NM ln(v270) — = Y Y 0% — (WHy) (8.21) 
ij 


As we can see, to maximize the likelihood, we can just minimize the second item, which is 
conventional NME cost function: 


Maximize = Minimize Dymp (X, WH) = = > > (Xj; — [WH]; (8.22) 
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8.4.3 Seek W, H through Their Prior Distributions 


Although ML (Maximize Likelihood) could give us the solution of W, H as long as we have enough 
X sample values, however, in practical Pyroelectric sensor network applications, we may not be 
able to obtain large samples of X. Moreover, to achieve real-time NMF operation, it is not realistic 
to wait for the collection of large X samples to calculate W and H. Large X matrix causes long 
calculation time. 

To solve the dilemma mentioned earlier, i.e., seek matrix factorization X= WH under small 
X matrix, we resort to MAP (Maximize a priori), which is defined in Bayesian theorem: 

Likelihood x Prior 


Posterior probability = ce ne oi (8.23) 


where 
Likelihood is the aforementioned likelihood function for X matrix 
Prior is the preassumed probability distribution for the parameters of X distribution 
Normalization factor can ensure the integral of the right side is 1 (thus making the posterior 


probability fit the scope of a probability: 0—1) 


If we use P(O) to denote the prior belief (distribution) of the parameters W and H, and assume 
data X is generated by a model with parameters © = {W, H}, then we can use the following way 
to represent Bayes’ formula (Figure 8.9): 

Therefore, our goal is to maximize P(O[X). If we take Log-posterior, we have the following: 


Maximize In P(O|X) = Maximize In P(W, H|X) = Maximize (In P(X[W, H) 
+ In(P(W) + InP(A)) 


Without the loss of generality, we assume the sensors’ event capture P(X|W, H) follows a Gaussian 
Likelihood. We also assume that W and H are independent, that is, 


P(W, H) = P(W)P(H) (8.24) 


Now we need to select the proper prior distributions for W and H. For each element of H matrix, 
we can assume its prior distribution is an even distribution as long as its value does not go beyond 
an upper threshold (which is the case for context matrix): 


const., 0< Hyj < H mat 
0, otherwise 


P(Ay) = | (8.25) 


Since the posterior distribution belongs to exponential family, we choose exponent distribution for 
the prior distribution of W matrix. 


POD =||] ae“. a>o (8.26) 
i k 
P(X,®) = P(XJO) P(®) = P(OIX) P(X) 
Joint distribution Likelihood Prior Posterior X distribution (evidence; 


normalization constant) 


Figure 8.9 Bayesian theorem. 
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sparseness) 
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Figure 8.10 Maximum a priori (MAP). 


Based on Equations 8.21, the equation (8.23) will be (in Log-Posterior) as follows: 


In PCW, HIX) a In P(X|W, H) + In (2(W) + In PH) 


œ 4 —NM In(W 270) — a Y Y (Xy — [WH y)? t+ an va) + Const 
i j i k 


x — 3 Y >) Xj — [WH]; +2007 Y Y Wat + Const (8.27) 
ij i k 


Therefore, MAP (i.e., maximizing In P(W, H|X)) can be obtained by minimizing the following 
L2-norm with a constraint item (Figure 8.10): 

Based on Lagrange Multipliers, we can concert the aforementioned problem to a special NMF 
model: 

Given a limited X sample, seek weight matrix W and context matrix H, to achieve: 


Minimize y (X; — [WH];)’, given constraints: (1)W, H > 0; (2) Y A Wiz < const 
i j i k 


Due to the importance of NMF with sparseness constraint, we will provide strict models next. 


8.4.4 Context Pattern Seeking under Sparseness Constraint 


One of NMF advantages is its basis sparseness. For example, if we extract some common features 
(to form “basis”) from humans’ faces, such as eyes, nose, mouth, and so on, we could use different 
ways (NMF, PCA, Wavelet, etc.) to search for those bases. It was found that NMF gave us the 
sparsest bases [23]. In Section 8.7 (experiments), we will illustrate this point. The sparseness is 
important to the reduction of memory storage and calculation complexity. More importantly, it 
makes NMF more like “parts-based” feature extraction, i.e., we can easily recognize an object by 
looking at its few features. 

The measurement of sparseness can be regarded as a mapping from R” to R which to quantify 
how much energy of a vector is packed into a few components. Without the loss of generality, 
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we adopt a sparseness definition used in Ref. [23] that considers an observation vector X with 7 
elements (x1,X2,..., Xp): 


Va- (Ek) WEF 
Saat 


As we can see, the scope of the aforementioned sparseness function is [0,1]: when all X’s elements 
are equal, we have the least sparseness of 0; when X has only one single non-zero element, we get 
the maximum sparseness of 1. 

Then our goal is to use NMF or the earlier discussed methods to decompose X into a weight 
matrix W and a context matrix H with desired degrees of sparseness. Then our question is: should 
W or H be made sparse? This depends on different applications. For instance, if a doctor wants to 
find the cancer patterns from large sample cases, it will be reasonable to assume that cancers are rare 
among all cases. Assume each column of X represents one patient and each row of X represents 
the symptoms of a specific disease. Then we expect the occurrence of a disease (a cancer) is rare 
(i.e., sparse), that is, the weight matrix W should be sparse. However, the doctor wishes that the 
symptoms of each disease should be detailed as possible, that is, the context matrix H should NOT 
be sparse. 

In our case, we wish to identify the walker’s gait patterns. We wish to use “sparse” features 
(represented by context matrix H) to identify gaits. On the other hand, we do not expect to have 
many sensor “clusters” in order to save cost. In other words, the weight matrix W should also be 
sparse. Therefore, both W and H should be sparse. 

Definition—sparse pyroelectric sensing: For a pyroelectric sensor network with K clusters and 
each cluster has VV sensors, if we sense N x M data X, the goal of “sparse pyroelectric sensing” 
can be formulated into a matrix factorization procedure: we seek non-negative weight matrix W 
and context matrix H, such that the least square meets, that is, Minimize |[X — W H]|?. In the 
meantime, W and H should meet two sparseness constraints: (1) Sparseness (w;) = Sw, for any ith 
row of W. S,, is the desired W sparseness (Preset by user; Range: [0,1]). (2) Sparseness (4;) = S}, 
for any ith column of H. Sy is the desired H sparseness (Preset by user; Range: [0,1]). 

We can use the projected gradient descent NMF algorithm (such as the one used in [20]) to 
seek sparse W and H in the aforementioned definition. The basic procedure is to take a step in 
the direction of the negative gradient, and then project onto the constraint space. The taken step 
should be small enough to reduce the ||[K—W H| |? in each step. The critical operation is projection 
operator that updates W and H when sparseness does not meet in step i 


Sparseness (X) = (8.28) 


wNext =W; 8 (XH/) © (W;H;H/ ) 


8.29 
aye = H; a (WTX) > (WI WH) ( ) 


where 
® means multiplication for matrix operation, 
is elementwise division. 


8.4.5 Smoothing Context Features 


Although sparseness can make NMF more like “parts-based” feature recognition, too sparse matrix 
representation could not accurately describe an object since most elements of the context matrix H 
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will be zero (or very small values). Therefore in some applications, we could control the “richness” 
of context matrix H by adding smoothness constraints to H. Smoothness tries to reduce the big 
differences among elements to make values have “smooth” differences. Here we define a smoothing 
function $ as follows [24]: 


9 T 
=(1—6)I+ —11 : 
S=(1-0) tr (8.30) 


where 
I is the identity matrix 
l is a vector of ones [1 11...] 


The parameter O is important. Its range is [0, 1]. Assume H = SH. The larger O is, the more 
smooth effect we can get. This can be seen from following fact: If8 = 0, H = H, and no smoothing 
on H has occurred. However, when 0 get approaching to 1, that is, 9 > 1, H tends to be a constant 
matrix with all elements almost equal to the average of the elements of H. Therefore, parameter O 
determines the extent of smoothness of the context matrix. 

The solution to a NMF with smoothness constraints is straightforward. We can just revise the 
following NMF iteration equation from W —> WS and H — SH: 


Wow == wold V; 
za za 7 ( WH) ik 


V; 
Hy, H EDEN wes (8.31) 
i 


WED a 


where i= 1,2,...,m,a=1,2,...,d, and k=1,2,...,n. 


8.5 Linear Principal Regression for Feature Visibility 


In order to visualize the context features of different scenarios, we project the multidimension vectors 
to a two-dimension space. Here we utilize the linear principal regression (LPR) to accomplish the 
data dimensionality reduction. 

The singular value decomposition of a binary signal matrix could be represented by 


Sinxn = pie AM cree (8.32) 


where 
U and V are orthogonal matrixes 
) is a diagonal matrix with eigenvalues 
m is the number of samples 
n is the dimension of samples 


Then S could be approximated by its first & eigenvalues as follows: 
sete” where Taxi = Unek’ eel Vei (8.33) 
To find the regression vector in £ dimensions, we seek the least square solution for the equation 
Iasi = Tmxkfexi where fixi = "TT (8.34) 
The regression vector could be solved as 


Fax = Pink Jal (8.35) 


Neuro-Disorder Patient Monitoring via Gait Sensor Networks m 195 


8.6 Similarity Score for Context Understanding 


During gait context understanding (testing phase), we use K-means cluster and vector distance to 
compute the similarity score between H and H . 


k 
arg min y x. |»; — wl? (8.36) 


i=1 hjeC; 


where 
hj is a context feature vector 
u; is the mean of cluster ¿ 
Ci, Cy,..., Cp are & clusters 


Scenario identification decision: For a context feature h;, we will have k + 1 hypothesis to test for 
k registered scenarios, i.e., {Ho,...,H,}. The hypothesis Ho represents “none of the above.” The 
decision rule is then 


h; € 


J 


E if max; p(h;| H) < Y (8.37) 


H; : i = arg max; p(h;|H;), otherwise 


where 
2(h;|H;) is the likelihood of 4; associating with the ith hypothesis 
y is a selected acceptance/rejection threshold 


8.7 Experiment Results 


In the following discussion, we provide part of our experimental results. For more detailed 
descriptions on our results, please refer to our other publications [9—14]. 


8.7.1 Gait Context Identification via Traditional PCA Signal Projection 


By using real application data from our iSMART platform, we first test the use of traditional, 
nonbinary PCA algorithm for signal projection in order to find out hidden gait context patterns. 
Here we aim to find four different gait contexts: (1) 1-patient scenario, (2) 2-patient scenario (two 
patients’ gaits signals are mixed together), (3) 3-patient scenario, and (4) 4-patient scenario. As we 
know, PCA has both pattern recognition and dimension reduction functions. Since we use general 
PCA that is not optimized for binary data, the context identification accuracy and calculation 
complexity would be high for our binary data matrix. Figure 8.11 shows the bases we found. 

As we can see from Figure 8.11, those four contexts share four common bases. Based on the 
weights (coefficients) for each basis, we are able to detect different contexts. In order to better 
see the differences among different contexts, we use PCA to project the context weights to a 2D 
coordinate. As shown in Figure 8.12, PCA can clearly map each context to different clusters. 

Following general pattern recognition methodology, we use receiver operating characteristics 
(ROC) graph to illustrate context identification accuracy. Figure 8.13 shows the similarity score 
distributions for self-testing and crosstesting when using PCA. As we can see, self-testing can yield 
true positive data, and crosstesting can yield false positive data, those two distributions can be 
almost completely separated from each other. This indicates PCA can efficiently recognize different 
contexts. 
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Figure 8.11 Contextual bases achieved by principal components analysis. (a) Eigen base 1. 
(b) Eigen base 2. (c) Eigen base 3. (d) Eigen base 4. 
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Figure 8.12 Feature clusters of four scenarios using principal components analysis. 


8.7.2 Gait Context Identification Using Probabilistic Matrix 
Factorization Model 
We then use the probabilistic NMF to identify those four contexts. As discussed in Section 8.4, 


NME is parts-based signal projection. It can use simpler bases (than PCA) to identify a context. 
Our experiment verifies this point. Figure 8.14 clearly shows that the bases generated from NMF 
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Figure 8.13 Similarity score distributions of self-testing and crosstesting using PCA. 
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Figure 8.14 Region of interest of NMF. (a) Base 1. (b) Base 2. (c) Base 3. (d) Base 4. 
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Figure 8.15 Feature clusters of four scenarios using non-negative matrix factorization. 


are sparser than PCA case (Figure 8.11). This is beneficial to the savings of memory storage and 
the reduction of weights calculation time. As a matter of fact, because NMF uses non-negative 
bases and weights, while PCA allows negative weights, NMF can interpret signal features in an 
intuitive way. 

We again use NMF to project pyroelectric sensor data to cluster architecture. As shown in 
Figure 8.15, NMF can completely separate different contexts. Moreover, compared to PCA case 
(Figure 8.12), NMF can make contexts falling into the same category converge into a smaller 
cluster, which indicates the advantage of NMF over PCA from context identification viewpoint. 

We then tested the similarity scores (explained in Section 8.6) in NMF case. As shown in 
Figure 8.16, the two distributions (self-testing and cross-testing) can be completely separated from 
each other. Moreover, compared to PCA case (Figure 8.13), the self-testing scores occupy only a 
small region and crosstesting scores cross a larger region. These results show that NMF can detect 
context more accurately than PCA. 

Then we compare the ROC graphs between PCA and NMF. As we can see in Figure 8.17, 
NME is always 1 no matter what value the false alarm rate is. PCA is <1 in certain rates. 


8.7.3 NMF with Smoothness or Sparseness Constraints 


To test the performance of constrained NME (see Sections 8.4.4 and 8.4.5 on sparseness and 
smoothness), we have used real pyroelectric sensor data to perform ROC test. The result is shown 
in Figure 8.18. It shows that NMF under smoothness constraints have the best performance. This 
could be due to two reasons: 


1. Scenario-dependent context prefers smoothness constraints: Two kinds of hidden context patterns 
can be extracted via NME algorithm: One is scenario-dependent context (i.e., how many 
walkers are in the scenario simultaneously); the other is path-dependent context (i.e., the 
same walker changes paths each time he/she walks through the sensors). The former (scenario- 
dependent) generates holistic context patterns (i.e., all NMF weights tend to be more evenly 
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Figure 8.16 Similarity score distributions of self-testing and crosstesting using non-negative 
matrix factorization. 
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Figure 8.17 Receiver operating characteristics of PCA and NMF. 


distributed), while the latter (path-dependent) generates local context characteristics (i.e., 
NMF weights tend to be distributed in two extremities). Our experiments have chosen 
the former (scenario-dependent) as the context identification objective. Therefore, adding 
smoothness constraint makes the NME weights look more holistic (i.e., all weights become 
more evenly distributed), which makes context extraction more convenient. 
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Figure 8.18 ROC of NMF with the constraints of sparseness and smoothness. 


2. Smoothness constraint fits our K-means cluster algorithm: We have adopted k-means cluster 
and vector distance in the calculation of similarity score function. Due to the various value 
positions in the feature vectors (i.e, NME weights) of the same cluster, k-means algorithm 
tends to smooth the feature vectors. In other words, K-means tends to reduce the sparseness 
degree of NMF weights, which is against the sparseness constraint operation. Therefore, 
adding sparseness constraint to NMF weight matrix would get a worse performance. On 
the contrary, adding smoothness constrain is consistent with the function of k-means cluster 
algorithm, which brings a better context identification performance. 


8.7.4 Pseudo-Random Field of Visibility Modulation 


We have also investigated the use of pseudo-random modulation of the field of view (explained 
in Section 8.7) to enhance context identification performance. All the earlier experiments were 
implemented via regular visibility modulation. In this section, we use Hadamard code-based 
pseudo-random visibility modulation, see Table 8.1. Hadamard matrix encodes visibility mask by 
replacing the “—1” with “0” in a Hadamard matrix. 

Figure 8.19 shows the similarity score comparison. As we can see, those two distributions are 
completely separate from each other, which indicates the perfect context identification rate. 

Figure 8.20 shows the ROC comparisons. We can see that the context identification rate 
is always 100% for pseudo-random visibility case. However, in regular visibility case, the 
ROC is <100% when the false alarm rate is <0.3. Therefore, the pseudo-random vis- 
ibility modulation can better capture data patterns and improve the context identification 
performance. 
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Table 8.1 


Coding Scheme for Visibility Modulation 


Sensor Number 


Regular Visibility 


Pseudo-Random Visibility 


1 [11100000000000] | [10101010101010] 
2 [01111000000000] | [11001100110011] 
3 [00011110000000] | [10011001100110] 
4 [00000111100000] | [11110000111100] 
5 [00000001111000] | [10100101101001] 


6 [00000000011110] | [11000011110000] 
7 [00000000001111] | [10010110100101] 
8 [00000000000111] | [11111111000000] 
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Figure 8.19 Similarity score distributions using regular visibility and pseudo-random visibility 
modulations. 


8.8 Concluding Marks and Future Work 


In this chapter, we have presented our recent research results on gait context identification in mobile 
patient tracking scenarios, which are monitored through extremely low cost sensors—pyroelectric 
sensors. Those sensors form a multicluster network for binary data collection (“1” means detected; 
“0” means not detected.) The next step research will focus on two aspects: (1) First, we will 
determine a good window size when we segment the incoming binary data. The window size should 
not sacrifice real-time context identification performance; (2) We will investigate the distributed 
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Figure 8.20 ROC from regular visibility and pseudo-random visibility modulations. 


NME implementation in sensors instead of sending all data to the base-station for centralized 
processing. Such distributed context identification can greatly reduce network processing overhead 
and delay. 
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9.1 Introduction to Wireless Sensor Networks 


A wireless sensor network (WSN) consists of a set of sensor nodes that are deployed in a field and 
interconnected with a wireless communication network. In general, they have short-range com- 
munication capability. These nodes cooperatively monitor physical or environmental conditions, 
such as vibration, motion, temperature, sound, etc. Each of these scattered sensor nodes has the 
capability to collect data and route the data back to the sink/base station [1,2]. 

Each sensor node comprises of a sensing unit, data processing unit, communication unit, and 
power unit. There may be additional components such as the localization unit, energy producer, 
position changer, etc. depending on the application. The general architecture of a sensor node is 
shown in Figure 9.1 
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Sensing unit Processing unit 


Figure 9.1 Basic units of a typical wireless sensor node. 


The strength of WSNs lies in their flexibility and scalability. The nodes are capable of forming 
self-organized networks using multihop communications. In many applications, it is impractical 
to recharge nodes after they are deployed, as WSN nodes do not receive personal human interac- 
tion/care and usually get deployed in the field at random unknown locations. Thus, sensor nodes 
show strong dependence on battery life. In WSN, each node plays the dual role of data collector 
and data router. Hence, malfunctioning of a few nodes can cause significant topological changes. 
It might require rerouting of packets and reorganization of the complete network. Therefore, 
energy efficient communication is a very significant constraint in WSN, and thus computational 
operations of nodes and communication protocols must be made as energy efficient as possible. 

Traditionally, almost all WSNs operate in unlicensed frequency bands, which are also used by 
other wireless applications, such as Wi-Fi, Bluetooth WiMAX, and ZigBee. Other applications 
such as microwave ovens and cordless phones also operate in those bands. This makes unlicensed 
bands overcrowded, which creates scarcity of spectrum and is also one of the significant problems 
for WSNs. With the rapid growth in ubiquitous low cost wireless hardware applications utilizing 
the unlicensed spectrum, demand and competitiveness for spectrum increases. Hence, network- 
wide performance degradation of traditional WSNs is expected. The condition may be more severe 
in the populated urban areas. 

The coexistence issues in unlicensed bands have been the subject of extensive research [3,4]. 
In particular, it has been shown that IEEE 802.11 networks [5] can significantly degrade the 
performance of ZigBee/802.15.4 networks when operating in overlapping frequency bands [4]. 

Also, currently WSNs lack the capability of fine-tuning their radio configuration parameters 
dynamically to meet the challenges of a dynamic operating environment, for instance, floods, 
seasonal changes in vegetation, spectrum congestion, etc. Wireless channel properties also keep 
changing randomly. Other wireless devices operating in the same environment add up to the 
changing wireless environment for WSNs. This may result in degradation in radio link performance 
and unreliable network connectivity [6]. 


9.1.1 Common Design Issues in WSN 


Routing: Routing is the process of selecting paths in a network along which network traffic is to be 
sent. In WSN, topology changes dynamically either because sensor nodes can move in the field or 
because a node exhausts its available energy. Moreover, sometimes the available channel may go 
into deep fade, which will then ask for change in the route. 
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Fusion of information with fault tolerance: Information fusion deals with the combination of 
information from the same source or different sources to obtain improved fused estimate with 
greater quality or greater relevance. 

Security in WSN: Information based on sensed data can be used in agriculture and livestock, 
assisted driving, or even in providing security at home or in public places. A key requirement from 
both the technological and commercial points of view is to provide adequate security capabilities. 

Scalability: It may happen that some nodes die or some more nodes join the existing wireless 
sensor network. The sensor network should adapt to the changes in the network size, node density, 
and topology. 

Deployment of network: Deployment means setting up an operational sensor network in a real 
world environment. Sensor nodes can be deployed either by placing one after another in a sensor 
field in a deterministic manner or by dropping them from a plane randomly. Various deployment 
issues need to be taken care of because they directly affect routing and localization. 

Spectrum scarcity: Rapid growth of wireless applications that work in the unlicensed band had 
created spectrum scarcity as the WSNs also operate in that band. Too much interference or 
unavailability of channel at proper time may lead to a disastrous situation. 

Localization of sensor nodes: Sensing data without knowing the sensor location are meaningless; 
hence, localization deals with determining the locations of wireless devices (sensor nodes) in a 
WSN. The challenge comes here due to the fact that either some GPS-based mechanism or some 
local positioning—based mechanism is deployed. 


9.2 Cognitive Radio 


Traditionally, wireless networks run with fixed spectrum assignment policy regulated by govern- 
ment agencies. A spectrum is assigned to service providers on a long-term basis for large geographical 
regions. These spectrums can only be allowed to be used by licensed users, but Federal Commu- 
nications Commission (FCC) measurements have indicated that many licensed frequency bands 
remain unused 90% of the time [7]. In order to better utilize the licensed spectrum, the FCC 
has launched a secondary market initiative [8], the goal of which is to remove regulatory bar- 
riers and facilitate the development of secondary markets in spectrum usage rights among the 
wireless radio services. This introduces the concept of dynamic spectrum allocation. Dynamic 
spectrum allocation is a technology that senses open channels and allows devices to communicate 
in underused parts of the spectrum. These underused parts of the spectrum, also called spectrum 
holes, can be visualized as in Figure 9.2. Dynamic spectrum allocation implicitly requires the use 
of cognitive radios to improve spectral efficiency. The inefficient usage of the existing spectrum 
can be improved through opportunistic access to the licensed bands without interfering with the 
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Figure 9.2 A typical spectrum occupancy chart. 
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Figure 9.3 Basic units of a typical cognitive radio. 


primary users (PUs). Cognitive radio (CR) is an intelligent wireless communication system that is 
aware of its surrounding environment. The idea is to use intelligent signal processing and decision 
making to enable the radio not just to utilize the spectrum efficiently but also to manage/adapt the 
other wireless parameters. These radios would dynamically reconfigure center frequency, waveform 
design, time diversity, and spatial diversity options. CR dynamically adapts the transmission or 
reception parameters of either a network or a node to achieve efficient communication without 
interfering with PUs. In simple words, a CR network consists of primary and secondary users 
(SUs). The PUs are the licensed users and hence have exclusive right to access the radio spectrum, 
whereas the SUs/cognitive users are the unlicensed users that can opportunistically access the free 
spectrum bands, without causing harmful interference to PUs. 

Thus, radios that are capable of these intelligent decisions are called cognitive radios. These radios 
observe their wireless environment and then use their intelligent algorithms and computational 
learning methods and take actions accordingly to optimize the behavior of the network. A simple 
model of a CR where the cognitive engine interacts with the radio is shown in Figure 9.3. 


9.3 Cognitive Radio-Based WSN 


In both CR and WSN, sensing tasks are performed to collect information from the operating 
environment about spectrum occupancy and environmental parameters, respectively, and then 
appropriate actions are taken accordingly. CR-based WSN speaks about application of CR in WSN 
not only in the PHY and MAC layer. It enables cognition by taking advantage in all the layers and 
thus employs a cross-layered approach. It is promising and also challenging for WSNs to adopt the 
CR technology. One of the biggest advantages is that it enables the WSN to sense spectrum hole and 
utilize the vacant frequencies to improve spectrum utilization. It is also capable of increasing its own 
quality of service and throughput by adaptively and cognitively changing the various transmission 
and reception parameters such as transmitted power, operating frequency, modulation, pulse shape, 
symbol rate, coding technique, and constellation size. A wide variety of data rates and Quality of 
Service (QoS) can be achieved that improve the power consumption and network lifetime in a WSN. 

CR technology can provide access not only to new spectrum bands but also to spectrum bands 
with better propagation characteristics. Generally, the lower frequencies have better propagation 
characteristics than the higher ones. The operation of WSNs at lower frequency bands allows range 
extension and higher energy efficiency. This helps in getting simpler topology as well as fewer sensor 
nodes to cover a given area as with lower frequencies the transmission range of the same node with 
same transmit power gets increased. 
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Some of the advantages of using low frequency for transmission are 


Higher transmission range 

Fewer sensor nodes required to cover a specific area 
Lower energy consumption 

Less number of hops to the destination 

Lowered end-to-end delay 


Higher transmission range improves several important factors in WSNs including network 
connectivity, lifetime, and end-to-end delay. 

Another advantage of using CR technology in wireless sensor networks is that data from various 
sensor nodes which are not spatially correlated or having low redundancy can be transmitted to the 
sink simultaneously in different channels which are free using the CR technology. This reduces the 
delay and enables the sink to monitor a large number of nonspatially correlated information in a 
real-time manner. 


9.4 CWSN Architecture 


Cognitive radio-based WSN is also like WSN in the sense that it consists of several tiny sensor 
nodes with all the constraints which a normal WSN has, especially the “limited battery energy.” 
They differ in their transceiver hardware architecture and states. In a CWSN, the hardware also 
consists of a cognitive module that is responsible for spectrum sensing and adaptively changing 
the transmission parameters in a reconfigurable transceiver (reconfigurable transceiver is another 
advantage of the CWSN). Apart from these, there is also a cognitive engine in the cognitive 
module, which is responsible for learning and achieving cognition in order to make the changes 
automatically without human intervention. This cognitive engine is also responsible for controlling 
the changes which should take place in transmitter and receiver. 

Cognitive engine works on the concept of cognition cycle (CC) [9]. States of CC are shown in 
Figure 9.4. Cognitive engine is comprised of six main states: observe, orient, act, decide, plan, and 
learn. It enables the nodes to achieve context awareness and intelligence so that it can be aware of 
its operating environment in order to sense for the white spaces, and use them in an intelligent 
and efficient manner. With regard to WSN, this cognitive engine may also assist in intelligent 
localization, routing, and scheduling in WSN. 

CWSN nodes have an additional state called sensing state where they keep sensing their envi- 
ronmental parameters. Spectrum sensing state consumes a lot of energy because it is directly related 
to transmission and reception, which is the most energy consuming activity of a CWSN/WSN 
node (Figure 9.5) [10]. 

Sensing the spectrum can be done either in a distributed fashion or in a centralized fashion. 
A separate discussion about spectrum sensing in cognitive wireless sensor networks has been 
explained in next section. 

Spectrum sensing can be done in distributed as well as centralized fashion, as shown in Figure 9.6. 
In the distributed scheme, all the nodes keep sensing the spectrum environment by their own and 
compete with other sensor nodes to grab the unoccupied spectrum. But the problem in this 
architecture is that all the nodes essentially need a spectrum sensing module, which may not be 
economically feasible. 
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Figure 9.4 Cognitive cycle. (From Howitt, I.; Gutierrez, J.A.; , “IEEE 802.15.4 low rate - wireless 
personal area network coexistence issues,” Wireless Communications and Networking, 2003. 
WCNC 2003. 2003 IEEE, vol. 3, no., New Orleans, Louisiana, USA, pp.1481-1486 vol. 3, 16-20 
March 2003.) 
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Figure 9.5 Simplified diagrammatic comparison of WSN and CWSN. (From Mitola III, 
J. and Maguire, G.Q., Cognitive radio: Making software radios more personal, Personal 
Communications, IEEE, 1999.) 
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Figure 9.6 Distributed and centralized spectrum sensing. (From Cavalcanti, D. et al., Cognitive 
radio based wireless sensor networks[C]//, Proceedings of the 17th International Conference on 
Computer Communications and Network [S. I.], IEEE Press, Washington, DC, pp. 1-6, 2008.) 


In the centralized scheme, there is one network coordinator which has the responsibility of 
spectrum sensing as well as spectrum scheduling, that is, allotting the free channels to the needy 
nodes as per their need in preferential order. 

But in a centralized control there is a difficulty. In centralized control, there has to be a separate 
control channel. On this control channel, there is broadcast of channel switch command according to 
which the sensor nodes change their Tx/Rx frequencies. The control channel can be from licensed 
or Industrial, Scientific, Medical (ISM) band. But there is always a fear that if the control channel 
somehow gets faded, and, in such case, the entire network will be in chaos. 

In a common WSN scenario where there are a large number of sensor nodes, it may not be 
feasible to have a spectrum sensing module on each node. The feasible architecture will be to make 
only a few specialized nodes capable to do the spectrum sensing. The network coordinator or the 
cluster head can do this task in realistic deployment. If we take a deterministic deployment scenario 
like home or office, then easily a few specialized nodes capable of spectrum sensing can be deployed 
especially for the spectrum sensing [12]. Since spectrum sensing is a repetitive process, which would 
consume extra energy from battery powered sensors, implementing spectrum sensing in all nodes 
in a WSN may not be efficient in terms of energy consumption. 

Sensing the target frequency/frequencies can be fixed or vary depending on the changing 
environmental and mobility condition of the nodes. Here, the cognitive engine can play an 
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important role. After collecting spectrum sensing data over a long period of time, the cognitive 
engine will be able to know at what time which channel or specific band has to be sensed, instead 
of sensing all the channels every time. The channel occupancy pattern also varies spatially. Hence, 
the cognitive engine can assist the spectrum sensing nodes to decide which channel/channels to 
sense at a particular period of time and in a particular geographical region. 

Sensing duration depends on the accuracy of sensing and also the required probability of 
detection. The more the sensing duration, the more complex algorithm for it, and thus the more 
energy it will consume. There has to be a trade-off between sensing duration and sensing accuracy. 
We will also discuss about a mechanism which uses appropriate energy budget for spectrum sensing 
according to the requirement/priority. During sensing, there has to be network wide quiet period, 
that is, nodes have to suspend their transmission following some schedule which would have been 
prebroadcasted well ahead of time by the coordinator to avoid any overlap, otherwise false alarms 
may occur indicating that some channel is occupied because of its own transmission or by some of 
the member sensor nodes. 

Only during network-wide quiet periods, spectrum sensing is performed by the network coor- 
dinators. This is done usually to detect incumbents at low IDT (incumbent detection threshold) 
values and avoid false alarms. All nodes are given a fixed schedule of quiet periods that they have to 
follow essentially. These quiet periods can be scheduled well ahead of time by a broadcast through 
the network coordinator so that all nodes can adjust their transmission to avoid overlaps with 
scheduled quite periods [11]. This will require a very tight and proper coordination among the 
nodes in WSN Quite Period [13]. 

QPs may be a critical issue for high-throughput networks, but in WSN the traffic load is typically 
much lower and it will not be that much of an overhead since most of the time the sensors would 
be ina “stand by” or “sleep” mode [4]. 

Now let us talk about some jargons related to spectrum sensing. There have been some limits 
set by the FCC for IDT, probability of detection (PD), probability of false alarm, maximum 
probability of false alarm (MPFA), channel move time (CMT), and channel closing transmission 
time (CCTT) for CR networks. Same set of limits would be applicable to a CWSN. But some 
parameter values can be relaxed for a WSN due to much lower transmit power used by the 
WSN transmitters compared to the 802.22 devices for which these protocols and regulations are 
intended to [11]. 

Modifying the existing 802.15.4 protocol, that is, ZigBee, to suit for the CWSN physical and 
MAC layer will be a very good approach. The 802.15.4 standard defines 16 channels, each of 
2 MHz bandwidth, in the 2.4 GHz band [14], among which only four are nonoverlapping with 
802.11 channels (of 22 MHz bandwidth) in the same band [11]. 

Whenever an incumbent signal is detected well above IDT, WSN has to switch to a backup 
channel within the CMT to avoid the interference. This requires the coordinator to broadcast 
the channel switch command. Channel switch command also consists of the scheduled switching 
time for the nodes. This command indicates the WSN nodes to switch their channels to the 
free available channel as specified by the coordinator. In case free channel is not available, they 
have to utilize the backup channel. This switching is held responsible by the coordinator, which 
decides how to allocate channels, that is, scheduling of free channels among the needy nodes. The 
coordinator can use its own spectrum sensing results (centralized spectrum sensing) or may use the 
spectrum sensing reports (distributed spectrum sensing) from other specialized spectrum sensing 
nodes. 

In practice, the coordinator has to make a list of backup channels available as there may be good 
probability that many backup channels are available at a particular time and geographical region. 
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Generally, in WSN, very fast incumbent recovery mechanism is not required. Once the PUs 
come into picture, the nodes have to vacate the channel very fast. But in some WSNs where there 
is very tight delay requirement there has to be provision of backup channels. This backup channel 
can be either from the licensed band shared with the PUs or from the unlicensed band shared 
with other SUs. Sensing of backup channel has to be done regularly in order to make sure that the 
backup channel is readily available and clean. 


9.5 Spectrum Sensing Schemes for CWSN 


Energy efficient spectrum sensing techniques are required to meet the power constraints of the 
CWSN. Spectrum sensing can be done in both time domain and frequency domain. 

A question is raised that how much energy should be spent on channel sensing. Although high 
energy budgets will result in accurate sensing outcomes, but it may not be needed all the time. 
Because, in some cases, the interfering signal may be sporadic or may be perceived with very high 
power and thus is easy to detect. In such cases there is no need for high budget spectrum sensing 
algorithms. For cognition, there should be provision to tailor the energy budget according to the 
signal strength. Sensing energy budget should also be tailored according to the size of the packet 
to be transmitted. In WSN, there are packets of several types and sizes, and these packets have 
different priorities. Sensor packets can be as small as single to few tens of bytes; therefore, selecting 
the right amount of energy that has to be devoted to spectrum sensing might significantly improve 
energy efficiency. Loss of long packet may cause retransmission of the packet, which may be even 
more costly. Hence, in such cases, high energy budgets can be applied to sense the spectrum. But 
when it is needed to transmit the low size packets, then low energy budget sensing algorithms can 
be applied. In this way overall lower energy consumption can be achieved [15]. 

Now we will have an overview of the spectrum sensing techniques available for CWSN. In [15], 
a good survey of such spectrum sensing techniques that can be used in CWSN has been provided. 
Spectrum sensing can be done individually or can be done in a cooperative manner to identify 
the spectrum holes frequently. The cognitive-transceiver devices have two important functionalities: 
spectrum sensing and adaptation. The spectrum sensing hardware of cognitive transceiver keeps 
sensing the spectrum over a wide frequency band. This information is then passed to the SUs 
(in our case the SUs are WSN nodes). When such a spectrum hole is found, the SUs adapts its 
transmission power, center frequency, and modulation in order to transmit efficiently as well as 
minimize the interference to the incumbents [16]. An implied assumption here is that all the nodes 
have reconfigurable hardware. 

Also while the transmission is in process, the cluster head or network coordinator, whichever is 
doing the task of spectrum sensing, should have the ability to detect the appearance of incumbents 
so that the SUs are able to change or give off the channel when the PUs starts transmission in that 
channel. 

Spectrum sensing for CWSN can be categorized as blind sensing and signal-specific sensing. 


9.5.1 Blind Sensing Techniques 


Sensing techniques that do not rely on any special signal features are called blind sensing techniques. 
There are variants of blind sensing techniques available. We are giving a brief overview of these 
techniques. More details can be found in Refs. [17,18]. 
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Energy detectors: The energy (power) detector estimates the signal power in the channel and 
compares that estimate to a threshold to determine whether any incumbent is present or not [17]. 

Eigenvalue-based sensing: Another blind sensing technique uses eigenvalues of the correlation 
matrix of the received signal containing both signal and noise [18]. 


9.5.2 Signal Specific Sensing Techniques 


Sensing techniques that utilize specific signal features are mentioned as follows. More details can 
be found in Refs. [19-24]. 

Signature sensing for ATSC signal identification: The specific signature found in Advanced 
Television Systems signals are used to detect the primary transmission [19]. 

FFT-based carrier sensing: This sensing technique involves estimating the power spectral density 
in the received signal and detecting the availability of a carrier [20]. 

Higher order statistics sensing (HOSs): This sensing technique works with the assumption that 
the noise is Gaussian. These HOSs can be used to clearly estimate how well the distribution of the 
test statistic meets a Gaussian distribution [21]. 

PLL-based ATSC pilot sensing: This sensing technique consists of two frequency tracking blocks 
each attempting to track the ATSC pilot frequency. Multiple methods can be used to implement 
the frequency tracking block. This method suggests using a digital phase lock loop (PLL) using a 
version of the Costas loop [22]. 

Wireless microphone covariance sensing: This method is similar to the eigenvalue-based sensing 
technique described earlier and it calculates the sample covariance matrix [23]. 

Correlation sensing of the spectrum: Here the estimate of the Power Spectral Density of the signal 
is calculated using FFT. This PSD estimate is then correlated with a prestored PSD for the signal 
of interest. But this technique is not suitable for WSN because of the limited memory available for 
data storage [24]. 


9.5.3 Cooperative Sensing 


While sensing the spectrum, the two major sources of getting degraded signals are multipath and 
shadowing for a given frequency. Here the cooperative spectrum sensing can help a lot. Presence of 
multiple radios helps in reducing the effects of severe multipath because of the achieved diversity. 
It reduces the probability that all users see deep fades at a time [25]. 

There are several spectrum sensing techniques. Each one has its own pros and cons. Some are 
fast with less accuracy and some are slow with more accuracy. Also the energy budget of all the 
sensing techniques varies. So it is required to optimize so that an overall less energy consumption 
and reasonable accuracy is achieved. 

In Ref. [26], we have suggested a novel architecture for spectrum sensing specially for CWSN. 
We have come up with a new idea of doubly cognitive architecture where the cognition is achieved 
not only with respect to time and space but also with respect to traffic. 


9.6 Learning in CWSN 


In this section we will have a brief overview of the control/learning mechanisms which can be used 
in CWSN. There can be several approaches as follows: 

Centralized: All the nodes are dependent ona centralized spectrum server for all of their spectrum 
requirements. Spectrum server determines and returns a suitable configuration to the nodes. 
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Distributed: All the nodes perform direct negotiation with their neighbors and decide for a 
suitable spectrum allocation. 

Local algorithm based: Every node in the network follows some set of rules locally instead of 
having a centralized controller or distributed negotiation. 

In centralized architectures, there is demand for a control channel which is available to every 
node in the deployment, that is, the channel used to exchange control messages has to be known 
by every node and unoccupied by other stations in that area. But some situation may occur when 
this control channel goes into deep fade or gets occupied (in case when it is a shared channel with 
the PUs). Moreover, there is an issue of system trapped in congestion when more nodes participate 
in the network. Distributed control is better suited for large environments where the same control 
channel cannot be maintained throughout the large sensor field. But this is more computationally 
intensive and the performance may go suboptimal. In Ref. [27], several CR control algorithms 
have been discussed. All these control mechanisms are some or other form of controlling with 
coordination between the nodes. 

Let us have a brief look on each of them. 

Rule-based reasoning: It is a particular type of reasoning that uses “if-then-else” rule statements. 
Rules are simply patterns, and an inference engine searches for patterns in the rules that match 
patterns in the data. It has been used in reconfiguring parameters and applications in order to 
accommodate the changes in the environment [28]. In Ref. [29], an idea of fuzzy rules has been 
used, where qualitative rules without hard decision boundaries can be employed. 

Responsive surface methodology (RSM)/Design of experiment (DoE): RSM is a means by which 
the relationship between several explanatory variables and one or more response variables are 
explored. In RSM, sequences of designed experiments are used to obtain an optimal response. 
This model is easy to estimate and apply, even when little is known about the process. But 
the estimated optimum may not be the real optimum. In Refs. [30,31], RSM has been used 
to characterize and learn rules which can be used in adaptation and configuration of the CR 
parameters. 

Game theory: Game theoretic approaches are getting very common nowadays for the control of 
cognitive networks. Some examples can be taken from Refs. [32-36]. 

Genetic algorithms: It is a search heuristic which mimics the process of natural evolution. It 
is then used to generate useful solutions to optimization and search problems. To use genetic 
algorithm, one needs to represent a solution to the problem as a genome (or chromosome). The 
genetic algorithm then creates a population of solutions and applies genetic operators such as 
mutation and crossover to evolve the solutions in order to find the best one. It can be used in 
order to analyze a predetermined fitness function to optimize an optimal CR configuration. Some 
researchers have implemented such optimization in systems using genetic algorithms in hardware 
which can be found in Refs. [37,38]. 

Linear programming: In linear programming, a mathematical model is developed to determine 
a way to achieve the best outcome in a given mathematical model for some list of requirements as 
linear relationships. In Refs. [39-41], the application of linear programming has been discussed, 
where they have used it in spectrum allocation, dynamic spectrum access for CDMA system, and 
optimal link activation schedule in CR networks, respectively. 

Swarm algorithm: Swarm intelligence is a collective behavior of decentralized, self-organized 
systems. In such algorithms, the nodes have to follow some very simple rules and there is no 
centralized control. Here, only the interaction between the nodes will lead to the evolution of 
intelligent global behavior. Ant colonies, fish schooling, etc. are natural examples. In Ref. [26], 
this concept is used for local independent control of CR networks, which includes interference 
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avoidance, coordination in the network, and network synchronization. The feasibility of this 
approach has been shown by its hardware implementation. 


9.7 Challenges in Designing CWSN 


Lifetime maximization or energy efficiency: In many cases, it is impractical to recharge nodes after 
they are deployed, as WSN nodes do not receive personal human intervention/care and usually 
get deployed in field at random unknown locations [42-50]. The same thing applies to CWSN. 
It inherits the problem of lifetime maximization and energy efficiency from the WSN. In fact, 
the situation gets deteriorated because apart from doing data sensing the nodes in CWSN are also 
involved spectrum sensing. Also they have to tune their RF parameters according to the requirement 
to achieve spectrum efficiency. Various modulation schemes, data rates, and coding schemes also 
directly influence the power consumption at each node. 

PU detection and localization: Sensor nodes are seen as the SUs in CWSN. These nodes share 
the licensed band with the PUs, but they must avoid the interference to the PUs. In order to 
avoid interference to PUs, sensor nodes must be aware of the PUs and their location within the 
region of interest. Hence, CWSN requires localizing the presence of PU within the vicinity of the 
network. 

Fusion: Information fusion deals with the combination of information from same source or 
different sources to obtain improved fused estimate with greater quality or greater relevance. In 
CWSN scenario, apart from data sensing, the nodes are also involved in spectrum sensing. Most of 
the time, this spectrum sensing is done in a cooperative manner where the spectrum sensing nodes 
share the spectrum sensing information with each other. This makes the fusion task even more 
challenging. As larger amounts of sensors are deployed in harsher environment, it is important that 
sensor fusion techniques should be robust and fault tolerant, so that they can handle uncertainty 
and faulty sensor readouts. 

Routing: In CWSN, topology keeps changing dynamically because sensor nodes can adjust their 
transmission parameters, and CWSN can turn its transceiver on or off based on the presence of PU. 
If a node exhausts its available energy, it ceases to function. Moreover, in the dynamic environment 
of CWSN, spectrum is not always available for data transfer to all the sensor nodes. This makes 
the routes to and from the base station to the nodes very dynamic. This adds new challenges to 
routing in CWSN. 

Resource allocation problem: Resource allocation, that is, spectrum scheduling in CWSN, should 
allocate spectrum fairly among all the sensor nodes and at the same time it should increase the 
spectrum utilization. 

Power allocation: Power allocation is another challenging problem to consider the co-channel 
interference when multiple new users decide to use the same frequency. Hence, some sort of control 
mechanism distributed/centralized is needed in this case to manage the co-channel interference. 

Optimization of the radio module: By adapting the modulation type and constellation size and 
channel coding rate, different data rates can be achieved which will directly influence the power 
consumption of each node, and in turn will affect the lifetime of the whole network. There may be 
several trade-offs between parameters to optimize the radio module. The selection of these radio 
parameters and changing these parameters on the fly is another challenging problem in CWSN. 

Spectrum sensing: Spectrum sensing should reduce the sensing duration time as during this 
interval all traffic is suspended and spectrum sensing is performed by the node. The spectrum 
sensing duration is also a challenge. Proper choice of quiet periods is also very important as it 
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directly affects the throughput of the network. Apart from this, choosing proper spectrum sensing 
algorithm according to the size/priority of data packets/signal strength also matters a lot in saving 
the energy of the spectrum sensing nodes. 

Representation of network architecture of CWSN: Network architecture helps in designing and 
understanding various functions of network and nodes since CWSN has dynamic environment 
and sometimes it may be mobile. Several responsibilities may be given to network coordinator or 
may be given to every node in the network and it translates to different architectures. Hence, it is 
very difficult to find general network architecture. 


9.8 Summary 


Application of CR and cognitive network concepts in WSN is an emerging technology. The primary 
difficulty with wireless sensor nodes is their battery life. But another significant problem of spectrum 
scarcity has emerged these days as the unlicensed spectrum bands are getting overcrowded. At the 
same time, the licensed spectrum bands are seen to be unoccupied most of the time, which can be 
used opportunistically. Using CR concept in WSN leads to observing, learning, and adapting in 
order to achieve spectrum efficiency, good throughput, and better QoS. Several architectures are 
proposed for CWSN. But it is difficult to say which one is better, as it depends on the environment 
where the network is deployed as well as the network size. Spectrum sensing is the most important 
task in any CR-based technology. Several spectrum sensing algorithms have been developed for 
mobile applications, but they are not suitable for WSN scenario due to limited battery life as well as 
less complex hardware. The spectrum sensing algorithms need to be dynamically tailored according 
to signal strength as well as the data packet size/priority. Managing/controlling the entire cognitive 
network is also challenging and a lot of research is going on in this area. Here also distributed, 
centralized, and local independent control mechanisms have been proposed for the same. Apart 
from the limitations which CWSN inherits from WSN, there are some particular challenges like PU 
detection, PU localization, sensing information fusion, resource (channel and power) allocation, 
optimization of radio module, spectrum sensing algorithm, and a generalized architecture for such 
a network. The challenges are big but the technology is very promising. 
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10.1 Introduction 


Signal processing (SP) is the process of information extraction or decision making by analyzing 
relevant observational or experimental data. SP such as estimation and detection is fundamental 
to numerous applications in wireless sensor networks (WSNs) such as environmental monitoring, 
industrial control, and military surveillance. WSNs designed for such applications normally consist 
of a large number of nodes densely scattered over an area of interest where multihop transmission is 
the only practical way to move data across the network [1]. Networking, therefore, is a vital process 
of multihop WSNs, and is carried out by routing protocols that are responsible for establishing 
routes between source and destination nodes. 
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One of the major constraints of WSNs is the limited energy supply. Since radio communication 
is often the most expensive operation a node performs in terms of energy usage, it is crucial 
for maximizing the lifetime of WSN that data packets are routed to the destination in an energy- 
efficient manner [2]. Indeed, intensive research has addressed energy-efficient routing for WSN (see 
Ref. [3] and the references therein). Common route metrics include the number of hops [4-6] and 
energy expenditure [7-11]. Hierarchical protocols [12,13] group nodes into clusters where cluster 
heads are responsible for intracluster data aggregation and intercluster communication in order 
to save energy. Location-based protocols utilize information on node locations to increase energy 
efficiency by relaying data only to certain desired regions [14]. These generic routing algorithms 
establish routes without considering the performance of signal processing (PoSP) that is achievable 
from the data being forwarded along the routes. 

Protocols that incorporate application performance or data quality into routing are also available 
[15-17]. In particular, in data-centric routing, the node desiring certain types of information sends 
queries to certain regions and waits for data from the nodes located in the selected regions [15,16]. 
Alternatively, reference [17] introduces information-directed routing to minimize communication 
cost while maximizing information gain. Reference [18] proposes a link metric that considers 
packets delay as well as network lifetime. 

SP in WSN has also been extensively studied. Traditionally, SP is carried out in a centralized 
manner where measurements from the nodes are collected and processed at a central location. 
In contrast, distributed methods [19-28] spread the computation across participating nodes to 
reduce communication cost and computational burden on any particular point of the system. For 
example, Ref. [26] proposes a hybrid energy-driven scheme where each sensor node sends out its 
1 bit decision if that decision exceeds a predetermined detection accuracy threshold, and sends out 
all its observations otherwise. Two multihop fusion schemes are proposed in Ref. [27]. In the first 
scheme, each sensor transmits the histogram of the observations of its descendants and itself. In the 
second scheme, the normalized log-likelihood ratio (LLR) values for subsets of nodes are computed 
and propagated along the minimum spanning tree to the fusion center (FC) which decides the 
hypothesis based on the acquired LLR values. An energy-efficient distributed source localization 
algorithm is proposed in Ref. [28], where the intermediate estimates are progressively processed by 
nodes along the routes and the refined results are further relayed to the fusion centre. 

Note that routing is not addressed in these distributed SP schemes and, therefore, a requirement 
for applying them is that routes between source and destination nodes are preestablished. One 
work related to joint optimization of routing and detection is Ref. [29]. It develops a serial fusion 
method for signal detection and proposes a routing scheme which is essentially a depth-first traversal 
technique enhanced with knowledge of locations of nodes. The intermediate detection decision 
is passed along the route which tries to traverse the network with as few hops as possible until 
a final detection decision can be made. However, the role of each node is considered identical; 
no quality of measurement of individual nodes is considered. Research on routing for signal 
processing (RfSP) protocols can be best represented by [30,31]. The protocols there facilitate 
joint optimization of PoSP and energy efficiency via metrics which directly connect PoSP with 
energy consumption associated with sensing and data transmission of each link along the routes. 
In particular, the problem of routing for the detection of a correlated random signal field is studied 
in Ref. [30]. A new link metric using the Chernoff information is proposed which characterizes 
detection performance based on the number and locations of nodes along the route. This novel 
metric captures the contribution of a given link to the decay rate of error probability of detection, 
and the route is determined using the shortest path framework though a centralized optimization 
algorithm. The problem of energy-efficient routing for detection subject to Neyman—Pearson 
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criterion is studied in Ref. [31]. There, routing metrics that connect the gain in signal-to-noise 
ratio and energy cost of each link are proposed, and combinatorial optimization programs are 
developed to solve for the best route in terms of a detection-probability-to-energy ratio. 

Protocols in Refs. [30,31] are both centralized RfSP schemes, where the routes are computed 
centrally requiring complex optimization algorithms and global information such as locations and 
observation coefficients of all nodes in the network. Reference [32] proposes a distributed RfSP 
scheme for the same problem as in Ref. [31] where the routing decision is made at each node 
autonomously based on locally available information only. Clearly, for large-scale networks, or 
networks with dynamically changing topologies, distributed routing schemes which require neither 
global information nor centralized optimization would be more practical due to their superior 
flexibility and scalability. 

In this chapter we address issues of building link metrics and the associated data aggregation 
schemes that realize R£SP, review R£SP protocols, and discuss the potential of and challenges facing 
R£SP. We also discuss in greater detail on how distributed R£SP schemes can be designed. Finally, 
we conduct numerical simulations to reveal the SP performance and energy efficiency of both 
centralized and distributed RfSP schemes as compared with generic routing protocols. 

This chapter is organized as follows. Section 10.2 describes the problem concerning SP and 
R£SP in WSN. Section 10.3 reviews centralized RISP schemes. Section 10.4 introduces distributed 
RESP strategies. Section 10.5 provides the simulation results and Section 10.6 draws the conclusion. 


10.2 Signal Processing and RfSP 


In this section we give a general introduction to RISP and discuss issues of building link metrics 
and the associated data aggregation schemes that one has to address in developing R£SP protocols. 


10.2.1 Signal Processing Problem 


Consider the scenario where sensor nodes are deployed over an area to collect measurements on 
a particular event or phenomenon. These measurements are then fed to a SP algorithm for the 
purpose of, for example, detecting the occurrence of the event or determining some information 
about the phenomenon. The data observed at node Nz can be described by 


Yk = Zesp + Wh (10.1) 


where 
sp is the unknown signal sampled by Vp 
wp is the observation noise 
gp is the observation coefficient which is in general dependent on the characteristics of node N; 
and its physical location 


In a centralized SP scheme, all raw measurements (94) are transmitted to a central location, that is, 
the FC, where the processing outcome D is achieved through a centralized computation: 


D = F({yp}) (10.2) 


where F(-) denotes the fusion rule applied to the data. 
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One major issue of centralized SP schemes is the energy inefficiency associated with radio 
transmissions since no data compression is made when the packets are being relayed by intermediate 
nodes. As stated in Refs. [30,31], the energy consumption at each node Ny can be modeled by 


ek = dA; + e, (10.3) 


where 
ey accounts for energy consumption for sensing and processing 
SA; is the energy of one data transmission over the link from Ng to its next-hop Nj, which 
depends on the distance A4; between N, and N; 


Consider a route of M equally spaced nodes, Q = {M1, N>,..., Nm), where the M-th node Ny 
is acting as the FC. Let A denote the node spacing, it can be easily verified that the total energy 
expenditure for the transmission along the route is. (1 +2+---(M — 1))( AY + ep) 


10.2.2 Distributed Signal Processing 


In a distributed SP scheme, nodes collectively achieve a global objective by each performing some 
processing based on locally available information and relaying the intermediate results. In particular, 
apart from the first node, the k-th node on the route receives the intermediate result ug_ 1 from 
the (k — 1)-th node and combines it with its local observation y4 to generate its own intermediate 
result 47, that is, 


Erluri y) k=1 
= 10.4 


which is relayed to the (k + 1)-th node. Note that uz may consist of multiple values. For example, 
it can include an estimate of a parameter and the associated confidence level. The last node on the 
route (i.e., the FC) generates the final result 


D = uy = Fulum-1,ym) (10.5) 


The strength of distributed SP is its potential of reducing energy consumption in transmission. 
Suppose uz can be sent with one transmission, the total energy expenditure for the transmission 
along the route of M equally spaced nodes would be (M — 1)(8AY + ep). In addition, in a 
distributed SP scheme, there is no special requirement on energy or computational power imposed 
on the FC. Here, any node can potentially act as a FC and the fusion (processing) process can stop 
at any node as soon as a preset performance measure (e.g., confidence level associated with the 
estimation) is achieved. 

Despite all of its advantages, distributed SP algorithms are generally nontrivial to design. 
In other words, for a centralized fusion rule F(-) of (10.2), it is a challenging task to produce 
equivalent local fusion rules {F,(-)} that collectively and incrementally achieve the objective and 
performance of F(-). This is a problem-specific practice and sometimes simplifying assumptions 
and approximations are necessary [18,24,28]. 


10.2.3 Routing for Signal Processing 


RfSP protocols further exploit the potential of distributed SP schemes. In addition to energy- 
efficient data delivery via distributed processing, the issue of where to collect these data is addressed 
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in R£SP. The goal of R£SP is the joint optimization of PoSP and energy efficiency via routing 
metrics which directly connect SP performance and energy consumption. The advantages of R£SP 
over generic routing protocols are the ability to achieve optimum PoSP-to-energy-consumption 
ratio and the controlled trade-offs between PoSP and energy efficiency. The solutions to R£SP are 
highly dependent on the specific SP problems to solve, and usually involve the following three 
critical processes: 


1. Design local fusion rules {F,(-)} to be applied by nodes progressively and associated 
intermediate results to be forwarded further downstream along the route. 

2. Develop a link metric q4; that characterizes the contribution of a link to the performance 
of SP. It measures the gain of the link from node N; to its next-hop N; by means of a 
quantity directly related to PoSP, for example, estimation variance or detection probability, 
and therefore is in general dependent on the data quality (as measured by the observation 
coeficient) of N; and the physical locations of both the local and next-hop nodes. 

3. Build up a routing metric A(-) that relates the total PoSP contribution (Q({2)) and energy 
expenditure (E(Q) = )°, eg (441)) of a route Q, and obtain the desired route Q* through 
the following constrained optimization: 


Q* = arg na A(Q(Q), E(Q)) (10.6) 


where | | is the set of routes which satisfy certain constraints. Typical constraints are 
C1: E(Q2) < Flim—route energy constraint 


C2: Q(Q) > Qlim—guaranteed performance 


For example, the routing metric A(-) = Q(Q)/£(Q) under C1 leads to the route that has the best 
performance-to-energy ratio among all routes with energy expenditure up to Elim. Note that the 
maximization in (10.6) could be replaced by minimization depending on how the metric A(-) is 
formulated. 

The task of developing RfSP protocols is, in general, challenging. First of all, as mentioned 
previously, the “splitting” of a central SP algorithm F(-) into local rules {F,(-)} is nontrivial. 
Secondly, the development of the link metric which captures the gain in performance of individual 
links can be very complex. In addition, in order to facilitate solving the optimization in (10.6), it 
is important for a link metric to be (1) independent—contribution of a link is independent of past 
and future links; and (2) additive —Q(Q) = >, 9%,(++1)- As will be demonstrated in Section 10.3, 
depending on the specific SP problem, approximations are often required in deriving a link metric 
of desired forms. 

Note that the optimization in the form of (10.6) is carried out centrally, demanding global 
information such as locations and observation coefficients of all nodes in the network. For such 
centralized RfSP schemes, although the SP is performed in a distributed fashion, the routes have to 
be precomputed and nodes programmed for each source-destination pair. Therefore, distributed 
R£SP (D-RfSP) schemes which require neither global information nor centralized optimization are 
more practical for large-scale networks, or networks with dynamically changing topologies. In a 
D-R£SP protocol, each node makes its routing decision autonomously based on locally available 
information only. However, since D-RISP protocols only utilize local information, they will be 
suboptimum as compared to their centralized counterparts. Details on how to design D-RfSP 
protocols will be provided in Section 10.4. 
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10.3 Centralized RfSP Schemes 


In this section we highlight the core development process of two innovative RfSP schemes [30,31] 
to illustrate how joint optimization of SP performance and energy efficiency can be achieved for 
specific SP problems. 


10.3.1 Chernoff Routing 


The detection of a correlated Gaussian signal field is considered in Ref. [30]. Let hypothesis Ay 
denote the presence of the phenomenon within the sensor network and Ap its absence, and assume 
all sensors have identical observation coefficients (gz = 1), the observations at each node NV; along 
the route Q = {M1, M2, . . . Nm} under each hypothesis are given by 


Ho:y = wp, R=1,..,M 
A: yp = Sp up k=1,...M 


where the s¿'s are correlated Gaussian samples of the signal with s ~ N(0, 0?) and a nondiagonal 
covariance matrix. The noise samples wz, are i.i.d. Gaussian with w ~ N(0, 02). The centralized 
decision rule is to decide Hy if 


In 


BU) 54 (10.7) 
poly) TU 


where 
y= [y1---59M] 
TG and p;(y), j = 0, 1, are the prior probability of H; and probability density function (PDF) 
of the joint Gaussian random variables under Hj, respectively 


The development of the link metric for R£SP starts with using the Chernoff information as a 
tractable metric which captures the probability of detection error of a given route. Then, Schweppe’s 
recursive representation of the likelihood function is used to express the Chernoff information in an 
additive form. Finally a link metric which is independent from link to link is created by assuming 
a Gauss—Markov signal correlation model. 

As stated in Ref. [33], the average error probability of the rule (10.7) is bounded by 


Pe <n ‘niet, O<s<1 


where u(s) is the cumulant generating function of the LLR under Ho, that is, 


u(s) = In Æo jerez), 0O<s<1 


with £;(-),7 = 0, 1, denoting the mathematical expectation under Ho and A, respectively. The 
Chernoff information [34] is defined as the u(s) evaluated at the maximizing value of s, that is, 


Q =arg ¿max (149) (10.8) 


= 
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In order to reveal the contribution of each link to Pe, Schweppe's recursive representation of 
the likelihood function [35] is utilized and it is shown that (10.8) can be approximated by 


Q= o dp with 


1 P 
qk = sia + de) (10.9) 
2 OW 
where P,y,_, = Ex {sy — Ex {s¢/y1,..-5y—1}}7 is interpreted as the power of the signal innovation. 
kl k—1 J J) P P g 


The proposed link metric (10.9) is monotonic in &. It indicates the amount of uncertainty resolved 
by collecting a sample from node N,, and the optimal route as the one that provides the maximum 
reduction in uncertainty. 

Since the link metric (10.9) is in general not independent from link to link, in order to 
make the optimization of finding the optimal route tractable, the signal field is assumed to be a 
Gauss—Markov process. For this special case, after some approximations, the link metric can be 
expressed as a function of the link length Az, 


1 
qu = 3 In(SNR + 1 — (SNR — eres) (10.10) 


where A is a parameter in the signal model, and SNR = 02/02. The Gauss-Markov model also 
allows the detection be carried out in a distributed fashion using the Kalman aggregation, where 
each node, based on data from previous-hop and local measurement, calculates and sends to its 
next-hop three quantities: the accumulated likelihood function, the variance of innovation, and 
the predicted measurement. 

With this link metric which characterizes the additive and independent contribution of each 
link to detection performance, the Chernoff routing [30] finds the optimal route using the shortest 
path framework. In shortest path routing [36], each link in the network is assigned a link cost yz; 
which quantifies the consumed resources of the link Nz to Nj, and the “least cost” route, where 
the cost of a route is simply the sum of its link costs, is sought. A constant link cost, that is, 
Ykj = € leads to the minimum-hop routing. Alternatively, setting Y4; = ef, where eg; is the 
link energy consumption given by (10.3), results in the minimum-energy routing. The link cost of 
the Chernoff routing introduces a weighting factor o > 0 to control trade-offs between detection 
performance and energy: 


Ykj = (ekj — aqu)? 


where 


+ Jx x>0 
aisi x<0 


It is shown in Ref. [30] that with proper values of the design parameters o and e, the Chernoff 
routing is able to achieve better detection performance for the same energy consumption than the 
minimum-hop and minimum-energy protocols. 


10.3.2 Combinatorial Optimization Routing 


The problem of target detection is considered in Ref. [31]. The measurement yz at node NV; along 
the M-nodes route Q depends on which of the two hypotheses, the noise-only hypothesis Ho and 
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target-present hypothesis A, is true. That is, 
Ho: = wp, &=1,2,....M (10.11) 
Hı : yp = Bey + k=1,2,...,M (10.12) 
where gy is the observation coefficient of Ny, B > 0 represents the reflection strength of the target 


in response to illuminating signals and is assumed unknown. The noise w is white Gaussian with 
zero mean and variance 0?. The PDFs under each hypothesis are, therefore, given by 


M 
1 1 5 
PU) = Of aa 207 Ya) (10.13) 
1 1 č A 
PI = yan aa e ra) (10.14) 


According to the Neyman-Pearson criterion that maximizes the probability of detection Pp for a 
given probability of false alarm Pza [37], H is decided if the likelihood ratio L(x) = pi(y)/po(y) 
is greater than a threshold. By utilizing (10.13) and (10.14), this criterion can be realized 
by the test 


M 
TU) => gu > Y (10.15) 
k=1 
where y is determined via 
Pa= | TOA (10.16) 
b:LN>y 


The test (10.15) immediately suggests a distributed processing scheme where each nondestina- 
tion node N;(J < M) computes the locally accumulated test statistic Y a geyp and forward 
it to the next-hop, and the detection decision is made at the destination node based on the 
accumulated 7 (y). 

The link metric for RfSP is derived by examining the performance of this detector as follows. 
Since the noise samples are uncorrelated and the test statistic T (y) is Gaussian for both hypotheses, 
it can be shown that 


po(T(y)) ~ N(0, 0?) 
pi(T(y)) ~ N(Be, 0%) 


M 
where € = YO gp. 
k=1 
The threshold y can then be determined according to (10.16) without knowing the value of B, 
and the probability of detection can be found to be 


OO 


Po = | moda; = Q (QF Prad) - Vx?) (10.17) 


y 
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where ma 
Q(x) = f Tk exp (-32) d 
K? = B?¿/0? 


It can be seen from (10.17) that the detection performance is totally characterized by €. Since 
(x) is a monotonically decreasing function, Pp is monotonically increasing with increasing € 
which is defined as the performance contribution of the route: 


M 
qm := Y g (10.18) 
k=1 


The corresponding link metric g? captures the performance contribution of each node on the route 
and, as demonstrated in (10.18), has the properties of additivity and independence. To facilitate 
joint optimization of detection performance and energy efficiency, the mean detection-probability- 
to-energy is introduced in Ref. [31]: 


Q (2) 
A (Q) = == 10.19 
(2) = 7 (0) (10.19) 
and the optimum route is obtained through the following optimization: 
Q* = arg max A(Q) (10.20) 


Two variants of (10.20) are also proposed in Ref. [31] where constraints on Q(Q) and E(Q) are 
introduced, respectively, to the optimization so that solutions with guaranteed performance or 
constrained energy can be sought. These routing metrics are shown in Ref. [31] to achieve superior 
detection performance and energy consumption trade-offs over the traditional minimum-hop and 
minimum-energy routing protocols. 


10.4 Distributed RfSP Protocols 


The centralized RfSP schemes demand global information such as locations and observation 
coefficients {gg} of all nodes in the network be available at a central location where Q* is computed 
in a static manner. As discussed in Section 10.1, in practice not only the network topology can 
change, (2) can also change in situations such as target movements. Such dynamics would make 
recalculating and updating the centrally optimized routes unrealistic. In this section we present 
strategies for distributed RfSP (D-R£SP) where the next-hop is calculated on the fly using only 
locally available information. That is, each node selects the next-hop autonomously with the goal 
of maximizing the SP performance associated with unit energy expenditure. One protocol of such 
nature is proposed in Ref. [32] for a signal detection problem. In a D-R£SP scheme, a local node NV; 
evaluates the performance gain (gg) each neighbor Nj can potentially provide and the associated 
energy cost (e), and selects the one which yields the largest performance-gain-to-energy ratio as 
the next-hop, that is, 


N* = arg max Iki (10.21) 
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where © denotes the set of V¿'s neighbors which meet certain conditions. The neighbors of a node 
are the nodes that physically positioned within its radio range such that direct communications 
are possible. Apart from designing a distributed signal fusion scheme and the link metric g4 j, two 
unique issues need to be addressed in developing a D-R£SP scheme. Firstly, energy constraining 
applied locally at each node is essential for the route to converge. Otherwise the route will attempt 
to traverse the whole network. Secondly, revisits should be avoided unless absolutely necessary since 
a revisited node does not provide any new contribution to the SP performance. The solutions to 
these matters are reflected in the design of the rules for ©. In this section we first address these two 
maters and then present in detail a D-R£SP strategy that is based on logical relationships of nodes. 


10.4.1 Meeting Energy Constrain 


D-R£SP protocols rely on information of neighboring nodes. This information is readily available 
for most of WSN systems. In fact, in order to maintain connection to the network, a node needs 
to keep some information such as addresses of its neighbors. This is usually done through a local 
record termed as the neighbor table. The neighbor table is generally built up during a node’s join 
process when it scans its neighborhood in order to discover its neighbors and find a potential 
parent node to join [38,39]. The ZigBee standard requires each node to keep the neighbor table 
up-to-date. This can be achieved, for example, by periodically scanning and/or monitoring the 
neighborhood. 

If the locations of neighbors and the destination are available, to control energy expenditure a 
local node Ng can regard the neighbor Nj as a next-hop candidate (i.e., a member of ©), only if it 
satisfies the following distance condition: 


Ae + e+ SAM + eo < Ep tim (10.22) 


where Ey lim is the energy constraint of the route from NV, to the destination Nm. Since Nz does 
not know the hops beyond Nj, here the energy consumption from the neighbor N; to Nm is only 
an approximation. Therefore (10.22) is a so-called soft constraint; it can be slightly dishonored if 
necessary. In Section 10.4, we will show that if the number of hops from N; to the destination can 
be determined, the corresponding energy cost can be estimated by the average energy per hop. 


10.4.2 Avoiding Unnecessary Revisits 


When the next-hop is determined locally, visiting a node more than once by the same route can 
happen, although this would rarely occur for densely deployed sensor networks. Therefore, a mech- 
anism should be built into a D-R£SP protocol to avoid unnecessary revisits while the route traverses 
to the destination. This can be accomplished by the visited node with observation coefficient g; > 0 
informing all of its neighbors that it will no longer contribute to the SP performance and should 
now be considered as having g; = 0. One way to implement this messaging is through dedicated 
application messages with the associated additional cost in energy and bandwidth. 

Another way to achieve this notification is to exploit the services of CSMA/CA-based MAC 
(medium access control). By using a reserved bit in the RTS (request-to-send) and CTS (clear- 
to-send) packets, the (re)visit status of a node can be probed and conveyed. This mechanism has 
been proposed in Ref. [40] for the purpose of informing neighbors of the type of packets to be 


transmitted. 
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The third way of carrying out this messaging is to take advantage of the broadcast nature 
of wireless transmission, where the radio transmission will be heard by all neighbors within the 
transmission range, not just the intended recipient. In the ZigBee protocol, the network layer is 
responsible for discarding overheard packets by checking the receiver address field in the received 
packets. Overhearing has been employed in WSN for network time synchronization [41,42], 
reducing redundant transmissions [43], and propagating the location of the mobile sink to the 
source node [44]. As proposed in the distributed RfSP protocol in Ref. [32], when a node with 
g > 0 is visited for the first time, it uses the energy which is enough to reach its farthest neighbor 
when transmitting to the next-hop. Upon hearing the transmission, both the intended recipient 
and overhearing nodes mark the transmitting node as VISITED and treat it as having g; = 0 in 
subsequent routing. This mark is cleared at the end of current routing session through, say, a 
time-out at each node. The extra energy in transmission (to reach the farthest neighbor) is the cost 
one has to pay for lacking of centralized coordination and optimization. 


10.4.3 Logical Topology and Tree Routing 


This section discusses the logical structure expressed by nodes’ addresses upon which the D-RISP 
strategy in the next section will be built. 

Sensor networks are normally constructed in a spanning tree manner by starting with a root 
node and growing as new nodes join the existing nodes as child nodes. Each node has one and 
only one parent while a parent can have multiple children. The resultant logical relationship (LR) 
is a simple tree structure although the network’s physical topology can be quite complex. One 
parameter associated with each node is its network depth 4. The root node has d = 0 and a nonroot 
node has a nonzero depth which equals its parent’s d + 1. The depth indicates the minimum 
number of hops a transmitted frame must travel, using only parent-child links, to reach the root 
node. When a packet from a node N; travels upward along the tree to the root, it will reach all 
its ancestors. Similarly, a node-to-root route from another node Ny will cover all ancestors of Nz. 
The node where the two node-to-root routes merge is termed as the joint node (JN) of the two 
starting nodes. Note that if the two nodes are the same, the JN is the node itself and, if one node 
is an ancestor of the other, their JN is the ancestor node. 

To make peer-to-peer communication possible, each node must have a unique identity that is 
typically in the form of its address. Due to the ad-hoc nature of the network topology, an address 
assignment scheme has to be in place to assign addresses to nodes when they join the network. We 
are particularly interested in those schemes where, instead of using randomly generated numbers, 
each parent node autonomously generates addresses for joining nodes according to specific rules. 
Various such structured address assignment schemes have been reported [38,45-52]. The most 
notable of them is the one specified in the ZigBee standard [38]. 

For such structured addresses it has been shown [38] that the LR of nodes on the same branch 
can be determined from their addresses. In particular, from the address of a node, the addresses of 
upstream nodes (i.e., its parent and all other ancestors) and downstream nodes (i.e., its children 
and all other descendants) can be determined. This vertical relationship has led to the so-called 
tree routing (TR) protocol. The TR is a distributed algorithm where the routing decision is made 
locally and the route is restricted to parent-child links only. In particular, when a packet is received 
by an intermediate node, it is forwarded either downward or upward along the logical tree. If it is 
determined that the destination is a descendant node, the data is sent down along the tree to either 
the destination node if it is a child or, otherwise, the ancestor of the destination node. Otherwise 
the parent node is chosen as the next-hop. 
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10.4.4 LR-Based RfSP Protocols 


The TR protocol can be very inefficient because of the restrictions on next-hop candidates. In 
fact, relationship richer than that utilized in TR can be identified in networks with structured 
address assignment schemes and, therefore, more efficient LR-based routing is possible. It is shown 
in Ref. [53] that for such networks the logical distance between any two nodes, in terms of the 
number of hops via parent-child links, can be determined by their addresses. In particular, for any 
source-destination pair (Vz, Nj), the hop-count from N; to Ng via parent-child links is given by 


Hy = dj + dy — 2da, (10.23) 


where dj, d, and djq are the depths of N;, Na, and the join-node of them, respectively. All these 
three depths can be readily determined from addresses of Nz and Nj. Note that this is a conservative 
prediction of the hop-count. That is, it is guaranteed that at most H; hops are required for a packet 
to travel from N; to Ny; if a cross-branch jump happens at a downstream node along the route, the 
actual hops will be less. 

This property can be used to evaluate all neighbors, not just parent and children, for finding the 
shortest possible route to the destination. As a result, the route cannot only traverse vertically along 
parent-child links, but also “jump” horizontally to other branches of the tree, generating more 
efficient routes. We now present a LR-based D-R£SP protocol which utilizes this routing potential. 
We assume that homogenous nodes are densely deployed and a structured address assignment 
scheme is in place to allow determination of (10.24). Each node is required to record locally 
the addresses and observation coefficients of its neighbors in its neighbor table. In addition, the 
distances to each neighbor are needed; their absolute locations are not required. 

The routing starts at the source node N, which is assumed to be the only node with the 
knowledge of the route energy constraint Flim. It transforms Elim to the maximum number of hops 
of the route based on the average transmission energy to communicate with all of its neighbors: 


Elim 


Amax a 
K Ej GAY, + €0) 


(10.24) 


where K is the number of neighbors of N1. This conversion is valid due to the dense deployment 
and homogeneous nature of the network. Node NV] then groups all neighbors which satisfy the 
following condition into the set © and selects the next-hop according to (10.21): 


Hy +1 < Hmax (10.25) 


where H; is the number of hops N; takes to reach the destination. The value of Hay is transmitted 
along with locally derived intermediate SP results to the chosen next-hop, which uses (Hmax — 1) 
as the maximum number of hops in selecting its next-hop. It is assumed that the address of the 
destination node is known to all nodes. Otherwise, the destination address could be conveyed from 
the source node to subsequent nodes in the same way Hmax is propagated. This process continues 
until the destination is reached. 

In essence, the next-hop is the one with maximum performance-gain-to-energy ratio among the 
neighbors that have guaranteed number of hops to the destination. If all these neighbors have zero 
observation coefficients (i.e., g; = 0 for all the nodes in ©), then the minimum number of hops to 
the destination is obtained as follows: 


Amin = arg min Hj (10.26) 
Na 
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and the node with Hmin and is physically closest to the local node Nz is chosen as the next-hop, 
that is, 


N; = arg min Ag; subject to Hj = Amin (10.27) 
J 

This D-RfSP strategy is readily implementable in ZigBee networks, where a structured address 

assignment scheme is in place and the overhearing based revisit avoidance can be employed. In the 

next section, we evaluate the performance of this D-RfSP protocol as applied in a signal estimation 

problem. 


10.5 Performance Evaluation 


We conduct simulations to reveal the SP performance and energy efficiency of the LR-based 
D-RfSP scheme in comparison with that of its centralized counterpart (C-R£SP), the TR, minimum 
hop (min-hop), and minimum-energy (min-energy) protocols. 

Consider the following data observation model at the &th node of route Q = {M, 
Na)... NMX: 


Yk = HO +w, k=1,2,..., M (10.28) 


where 
gp = 0 is the observation coefficient 
wp is the white Gaussian noise with zero mean and variance 0? 
O is the unknown signal to be estimated 


It is shown in Ref. [54] that the best linear unbiased estimate and its variance are given by 


A Sa e 
= eto 10.29 
8 22) ( ) 
A o? 
var(0) = 265 (10.30) 


where Q(Q) = a g. The estimation (10.29) can be carried out in a distributed manner where 
each intermediate node N;(J < M) sends locally computed results Sia gyk and De 2? to 
the next-hop. It can be seen from (10.30) that qg; = e captures the performance contribution 
of each neighbor W; and is thus chosen as the link metric for RfSP schemes. Accordingly, the 
D-RfSP protocol implements the LR-based scheme introduced in Section 10.4 (may incur extra 
transmission energy as per overhearing based revisit avoidance), and the C-RISP protocol obtains 
the optimum route through the following optimization: 


Q* = arg max A(2) subject to E(Q) < Elim (10.31) 


where £(2) is the route energy consumption and A(Q) = Q(2)/E(Q) is the estimation-gain-to- 
energy ratio. 

An event-driven simulator developed in MATLAB® is employed. In all the simulation sce- 
narios, the ZigBee network parameter is set as (Cm, Rm, Lm) = (5, 5, 6), where Cm, Rm, and 
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Ly denote the maximum number of children a parent can have, maximum number of router 
children a parent can have, and maximum network depth, respectively. Two types of nodes are 
specified in ZigBee networks, namely routers that can accommodate other nodes as children and 
end devices that cannot have children of their own. The network setup process including address 
assignment scheme of the ZigBee standard is followed. In particular, after the nodes are deployed 
with every node having acquired a physical location, the root node is powered on to start the 
network. All the other nodes then power on and search the neighborhood for a potential parent 
which is by definition a router already joined the network and having capacity to accommodate 
more children. The joining node and potential parent then exchange join request and response 
messages to complete the join process by which the joining node is assigned an address. The 
network is established after all nodes have joined the network. The target is then deployed and each 
node builds up its neighbor table which contains the observation coefficients of and distances to its 
neighbors. 

For the C-RISP optimization (10.31), all routes satisfying the energy constraint are sought 
[55] and the one with the largest estimation-gain-to-energy ratio is selected [31]. The routes by 
the min-hop and min-energy protocols are found using the algorithms in Ref. [55] as well. All 
these operations require centralized optimization based on the deployment information—nodes’ 
locations and observation coefficients. The solutions of these schemes are, therefore, globally 
optimal in terms of the corresponding metrics. For the D-R£SP scheme, the routes are determined 
progressively at each node as the routes traverse to the destination. The TR protocol also determines 
the route locally but, as the min-hop and min-energy protocols, does so without considering the 
estimation performance. An event is defined as the transmission of a network layer packet from 
the source node to the destination node along the route determined by a routing protocol under 
study. The events happen sequentially, that is, an event starts after the previous one finishes. As a 
result, there is no packet collision or channel contention during packet transmission. This allows 
us to focus on the routing protocols under study by examining the performance of the routes 
generated. 

We consider the scenarios where 100 nodes each with a transmission range of 200m are 
randomly deployed in a square region with size 1000 m by 1000 m. The destination node is the 
root node which is located at the centre of the region (500 m, 500 m) and the signal source (i.e., 
the target) is located at (100 m, 100 m). The observation coefficient of a node within sensing range 
T; is inversely proportional to its distance (A4) to the target, that is, 

g= | I/A, if As | (10.32) 


0 otherwise 


For each simulation scenario, 200 instances of the sensor network are randomly generated. For each 
instance, a source node is randomly chosen among the nodes within 100 m away from the target 
and the routes to the destination node established by the routing protocols are studied, that is, their 
estimation variances and estimation-gain-to-energy ratios A(Q) are determined. The results over 
all the network instances are then averaged to produce the final measure. The following parameters 
of the data model (10.28) and energy consumption model (10.3) are chosen: 0 = 1,5 = 1072, 
v=2, eg = 0.5 uJ and the noise variance is chosen to be 0? = 3.98 x 107% which amounts to a 
4-dB SNR at a node 100 m away from the target. 

Figure 10.1 shows one instance of the network and the routes determined by the routing 
protocols when sensing range 7, = 200m and energy constraint Elim =5 uJ, where the star 
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Figure 10.1 One instance of the network and routes. 
Table 10.1 Routes in Figure 10.1 and Their Performance Metrics 
Protocol 2 E(Q) in y) Q(9) MS) in 1/u] | var(6) 
Min-hop 1>8>9%>6>7 3.4691 0.9050e—3 | 0.2609e—6 | 0.0440 
Min-energy | 1>4>5> 12> 7 3.2878 0.9375e-3 | 0.2852e—6 | 0.0425 
TR 1>8>10>11>6>7 4.0759 0.9050e—3 | 0.2220e—6 | 0.0440 
C-RfSP 1>2>3>4>5>12>7 4.3613 1.8972e-3 | 0.4350e—6 | 0.0210 
D-RfSP 1>2>3>4>5>6>7 5.2261 1.8972e—3 | 0.3630e—6 | 0.0210 


denotes the target and small circles denote sensor nodes. Table 10.1 lists the routes and associated 
performance metrics. 

It can be seen from Table 10.1 that the min-energy route, as expected, has the least energy 
consumption. Since the min-hop route is not unique and the earliest identified one is chosen, 
it does not have the same performance metrics as the min-energy route even though they have 
the same number of hops in this particular case. The TR route has comparable performance 
both in estimation performance and energy efficiency as the other two generic protocols since in 
essence it is a hop-count-based scheme. Both the C-RfSP and D-R£SP routes traverse away from the 
destination initially in order to cover nodes with more information (i.e., the ones close to the target.) 
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Figure 10.2 Estimation variances with changing energy constraint. 


The energy efficiency A(Q2) of the D-R£SP route is less than that of the C-R£SP route which is the 
optimum solution in this regard. This inferior performance of the D-RISP is due to the fact that a 
node only knows its neighbors and makes a routing decision based on local information only. Both 
R£SP routes are shown to have more accurate estimates and higher energy efficiency than that of 
the three generic routing routes. 

Figures 10.2 and 10.3 show the var(6) and A(Q2), respectively, of the routes when T; = 200 m 
and Flim changes from 4 to 8uJ. The estimation performance and energy efficiency of the TR, 
min-hop, and min-energy routes remain constant since they are independent of energy constraints. 
Both centralized and distributed RfSP protocols have increased estimation accuracy when the 
energy constraint is increased. For the D-R£SP protocol, it appears that when £jim is tight (<5 uJ) 
there is not much room to maneuver, and when Elim is loose (>7 yJ) the route traversals a larger 
number of nodes to increase the estimation gain Q(Q), resulting in reduced energy efficiency in 
both cases. The two RfSP schemes are shown to yield better estimates and energy efficiency than 
the three generic routing protocols. 

Figures 10.4 and 10.5 show the var(6) and A(Q), respectively, of the routes when Flim = 5 uJ 
and T; changes from 100 to 500 m. When T; increases, all the five routes are shown to produce 
better estimates since more nodes are included in the sensing range. These additional nodes, 
however, have relatively small observation coefficients g, because they are further away from the 
target (see (10.32)). This leads to reduced energy efficiency of D-RISP routes since they try to 
capture more estimation again by visiting increased number of nodes. A possible modification 
to the D-RfSP protocol is to incorporate a threshold T and treat all neighbors with g; < T as 
having gj = 0. Once again, the two RfSP protocols outperform the three generic protocols in both 
estimation performance and energy efficiency. 
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Figure 10.4 Estimation variances with changing sensing range. 
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Figure 10.5 Energy efficiency with changing sensing range. 


10.6 Concluding Remarks 


Routing protocols that jointly optimize SP performance and energy efficiency have great potential 
in achieving energy-efficient routing. Although being inferior to the global optimum solution 
in terms of energy efficiency, distributed RfSP protocols do not require centralized optimization 
and, therefore, are viable alternatives to centralized schemes for large-scale networks due to their 
flexibility and scalability. Simulation results have shown that both centralized and distributed RfSP 
protocols are able to achieve significant improvement in both SP performance and energy efficiency 
over generic routing schemes. 

Developing RfSP schemes is challenging and problem specific. Future work in this area could 
involve extending link metrics for Gauss-Markov signals to more general models and dealing with 
colored or non-Gaussian noise. Another interesting work is to develop R£SP schemes for other WSN 
SP algorithms such as beam-forming [56], channel identification [57], and principal component 
analysis [58]. 
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11.1 Introduction 


Wireless sensor networks (WSNs) have been experiencing a rapid growth in recent years due to 
the joint efforts of both academia and industry in developing this technology. Ifon one hand the 
academia is going ahead in developing new innovative solutions looking at enabling sensor network 
pervasiveness, on the other hand the industry has started to push on standardization activities 
and real implementations concerning reliable WSN-based systems [1,2]. WSNs are nowadays 
envisioned to be adopted in a wide range of applications as an effective solution able to replace old 
wired and wireless systems that are more expensive and hard to set up because of their necessity of 
power and connection cables. A reduced set of WSN applications include climatic monitoring [3,4], 
structural monitoring of buildings [5,6], human tracking [7], military surveillance [8], and, more 
recently, multimedia-related applications [9-11]. 

The wireless multimedia sensor networks (WMSNs) development has been mainly fostered 
by a new generation of low-power and high performance microcontrollers able to speed up the 
processing capabilities of a single wireless node, as well as the development of new microcameras 
and microphones imported from the mobile phones industry. Along with classical multimedia 
streaming applications in which voice and images can be sent through the network, pervasive 
WMSNs, consisting in large deployments of camera equipped devices, may support new vision- 
based services. By collecting and analyzing images from the scene, anomalous and potentially 
dangerous events can be detected [12], advanced applications based on human activities [13] 
can be enabled and intelligent services, such as WMSNs-based intelligent transportation systems 
(ITS) [14], can be provided. 

A successful design and development of vision-based applications in WMSNs cannot be achieved 
without adopting feasible solutions of the involved computer vision techniques. In such a context, 
state-of-the-art computer vision algorithms cannot be directly applied [15] due to reduced capabil- 
ities, in terms of memory availability, computational power, and CMOS resolution, of the camera 
nodes. In fact, since WMSNs usually require a large number of sensors, possibly deployed over 
a large area, the unit cost of each device should be as small as possible to make the technology 
affordable. As a consequence of the strong limitations in hardware capabilities, low-complexity 
computer vision algorithms must be adopted, while reaching a right trade-off between algorithms 
performance and resource constraints. 

In this chapter, we tackle with the problem of developing low-complexity computer vision 
algorithms targeted to WMSNs devices for enabling pervasive ITS. To this end, we present a 
parking space monitoring algorithm able to detect the occupancy status of a parking space while 
filtering spurious transitions in the scene. The algorithm has been developed by adopting only 
basic computer vision techniques and its performance evaluated in terms of sensitivity, speci- 
ficity, execution time, and memory occupancy by means of a real implementation in a WMSN 
device. 


11.2 Smart Cameras for Wireless Multimedia Sensor Networks 


The performance of vision-based algorithms targeted at WMSN devices mainly depends on the 
computational capabilities of the whole smart camera system. In this section, we provide an overview 
of the most popular embedded vision platforms in the WMSNs domain, as well as a description of 
their main hardware and image processing characteristics. A final comparison overview among the 
described platforms is reported in Table 11.1, while in Figure 11.1 some pictures of the devices 
have been reported. 
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Table 11.1 Platforms Characteristics Comparison 


Platform Sensor CPU Application 

WiCa [16] 2 Color CMOS 640x480 | Xetal IC3D Local processing, 
collaborative reasoning 

Cyclops [17] Color CMOS 352x288 ATMega 128 Collaborative object tracking 

MeshEye [18] Color CMOS 640x480 ARM7 Distributed surveillance 

CMUcam3 [19] Color CMOS 352x288 ARM7 Local image analysis 

CITRIC [20] Color CMOS 1280x1024 | XScale PXA270 | Compression, tracking, 
localization 

Vision Mesh [21] | Color CMOS 640x480 ARM9 Image-based water analysis 


(a) (b) 


(d) 


(f) 


Figure 11.1 Smart cameras for wireless multimedia sensor networks. (a) WiCa. (b) Cyclops. 
(c) Mesheye. (d) CMUcam3. (e) CITRIC. (f) Vision Mesh. 


In the last years, several research initiatives produced prototypes of smart cameras able to 
perform an onboard image processing. Among the first developed devices must be cited the 
WiCa [16] camera, developed by NXP Semicondutcors Research. The platform is equipped with 
NXP Xetal IC3D processor based on an SIMD architecture with 320 processing elements and can 
host up to two CMOS cameras at VGA resolution (640 x 480). The communication standard 
adopted to send data through a wireless network is the IEEE802.15.4. WiCa has been adopted for 
image-based local processing and collaborative reasoning applications. 
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The Cyclops [17] project is another research initiative aimed at developing an embedded vision 
platform for WMSNs. The device is equipped with a low performance ATMega128 8 bit RISC 
microcontroller with 128 kB of FLASH program memory and only 4 kB of SRAM data memory. 
The CMOS sensor supports three image formats of 8 bit monochrome, 24 bit RGB color, and 
16 bit YCbCr color at CIF resolution (352 x 288). The board is not equipped with wireless 
transceivers; however, wireless communications can be enabled in IEEE802.15.4 networks via 
MicaZ mote. In the Cyclops board, the camera module enables for a whole image processing 
pipeline for performing demosaicing, image size scaling, color correction, tone correction, and 
color space conversion. The Cyclops effectiveness has been demonstrated in collaborative object 
tracking applications. 

In the MeshEye [18] project, an energy-efficient smart camera mote architecture based on the 
ARM7 processor was designed, mainly targeted at intelligent surveillance application. MeshEye 
mote has an interesting special vision system based on a stereo configuration of two low-resolution, 
low-power cameras, coupled with a high-resolution color camera. In particular, the stereo vision 
system continuously determines position, range, and size of moving objects entering its fields of 
view. This information triggers the color camera to acquire the high-resolution image subwindow 
containing the object of interest, which can then be efficiently processed. To communicate among 
peer devices, MeshEye is equipped with an IEEE802.15.4 compliant transceiver. 

Another interesting example of low cost embedded vision system is represented by the CMU- 
cam3 [19] developed at the Carnegie Mellon University. More precisely, the CMUcam3 is the 
third generation of the CMUcam series, which has been specially designed to provide an open 
source, flexible, and easy development platform targeted to robotics and surveillance applications. 
The hardware platform is more powerful with respect to previous CMUcam boards and may 
be used to equip low cost embedded systems with vision capabilities. The hardware platform 
is constituted by a CMOS camera, an ARM7 processor, and a slot for MMC cards. Wireless 
transceivers are not provided on board and communications can be enabled through mote systems 
(e.g., IEEE802.15.4-based communications via FireFly mote). 

More recently, the CITRIC [20] platform integrates in one device a camera sensor, an XScale 
PXA270 CPU (with frequency scalable up to 624 MHz), a 16 MB FLASH memory, and a 
64 MB RAM. Such a device, once equipped with a standard wireless transceiver, is suitable for 
the development of WMSNs. The design of the CITRIC system allows to perform moderate 
image processing tasks in network nodes. In this way, there are less stringent issues regarding 
transmission bandwidth with respect to simple centralized solutions. CITRIC capabilities have 
been illustrated by three sample applications: image compression, object tracking by means of 
background subtraction(BS), and self-localization of the camera nodes in the network. 

Finally we cite the Vision Mesh [21] platform. The device integrates an Atmel 9261 ARM9 
CPU, 128 MB NandFlash, 64MB SDRAM, and a CMOS camera at VGA resolution (640 x 480). 
The high computational capabilities of the embedded CPU permit to compress acquired images 
in JPEG format as well as to perform advanced computer vision technique targeted to water 
conservancy engineering applications. In-network processing of the acquired visual information 
may be enabled by means of IEEE802.15.4-based communications. 

All reported smart camera devices represent an effective solution for enabling vision-based 
applications in WMSNs. The general trend in developing such devices is to increase computational 
capabilities without taking into account power consumption issues, the lowest experienced power 
consumption among the presented smart cameras devices is bigger than 650 mW [22], still a 
prohibitive figure for autonomous set ups in pervasive contexts. 
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11.3 Image Processing for Parking Space Occupancy Detection 


WMSNs are considered a key technology in enabling pervasive ITS [23]. The use of low-cost smart 
cameras to detect parking spaces occupancy levels provides an effective and cheaper alternative to 
state-of-the-art magnetic field sensors installed under the asphalt [24]. In this section, we present a 
low-complexity computer vision solution aiming at detecting the occupancy status of a single park- 
ing space. The algorithm can be easily instantiated in multiple instances in real smart camera devices 
dedicated to monitor a set of parking spaces, until reaching the full coverage of the parking lot. 


11.3.1 Background Subtraction Approach in WMSNs 


Classical computer vision approaches for monitoring applications usually consist on BS-based 
algorithms [25]. As a function of the required constraints in terms of frame rate and image size, 
as well as of the adopted technique to model the background, (e.g., mixture of Gaussians, kernel 
density estimation, etc.) a BS approach can respond to a variety of performance and complexity 
levels. 

A BS-based approach, with a frame differencing (FD) enforcement, is at the basis of the 
presented algorithm in which a low-complexity objective has been followed. As already discussed in 
Section 11.1, state-of-the-art computer vision algorithms cannot be directly applied in a WMSNs 
scenario due to several smart camera constraints: memory size, computational power, CMOS 
resolution, and energy consumption. The reduced amount of memory and CMOS capabilities 
have a strong impact on image frame resolutions and color depths: feasible resolution values 
are 160x120, 320x240, and 640x480 pixels, usually in gray scale. The energy consumption 
constraint is directly related to the maximum allowable frame rate, feasible values are lower than 
2 fps (state-of-the-art computer vision algorithms usually work at 25, 30 fps), due to the necessity 
of increasing the device idle time during which all the peripherals are turned off. Regarding the 
limited computational capabilities, these require the development of simple background modeling 
techniques, strongly related to the developed application. In our approach a custom background 
modeling technique is defined with the aim to react to permanent changes in the scene (e.g, 
luminosity variation) and filter spurious transitions (e.g., once-off variations) while guaranteeing a 
real-time response. 


11.3.2 Parking Space Status Analysis 


In order to effectively model the background of the monitored parking space scene, a behavioral 
analysis regarding the possible status of a parking space must be performed. Considering a parking 
space identified by a region of interest (ROD), as depicted in Figure 11.2, three main possible states 
are possible: full, partially full/empty, and empty. While the full and empty states do not require 
further investigations, the partially full/empty one must be better detailed. The car that is parking or 
leaving the monitored space usually requires several video frames to complete all the maneuvering, 
thus giving the possibility to slowly move from full to empty or vice versa. In a better explicative 
way, it is possible to call the partially full/empty state as transition state. Although the transition 
state can model all the car maneuvering, it can also be used to model possible errors due to car and 
people passing through the ROI and causing false empty to full transitions. 

The aforementioned full parking and leaving process can be modeled by means of the three 
state Markov chain depicted in Figure 11.3. In fact, the probability to be in a state at time ¿+ 1 
depends only by the state at time 7. This observation can be expressed in mathematical terms as 
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Figure 11.2 Parking spaces identified by their own ROIs. 
pet pf 
pe pt 


Figure 11.3 Markov chain-based parking space model. 


Pisi = xi41|50 = X0 51 = Xe +557 i) = Ploi = xe4115; = xi) (11.1) 
where 
si is the status at time i 
Xit Xi ...,x1,x0 E [empty, transition, full} 


Regarding the transition probabilities of the Markov chain, these represent the usage trends of 
the parking space and can be experimentally evaluated in time windows. The effective transition 
frequencies can be obtained by means of a ground-truth human analysis, and will be in turn used 
to measure the performance of a given detection algorithm. Better algorithms give transition values 
closer to the human ground-truth. 


11.3.3 Background Modeling 


In BS-based algorithms, the reference background of the scene requires to be updated at runtime 
to react to luminosity variations while filtering once-off changes. In order to create a background 
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modeling technique feasible to be implemented in a camera network device, simple computer 
vision techniques have been implemented and applied following the Markov chain behavioral 
model discussed in Section 11.3.2. A background modeling technique specifically designed for the 
final application can guarantee good performance while keeping low the computational complexity. 

In the proposed parking space monitoring application, the developed background modeling 
technique aims at compensating the luminosity variations and once-off changes effects in a pre- 
dictable way starting from the system state knowledge, thus guaranteeing the background model 
consistency with respect to the real system state. The background luminosity variation compensa- 
tion is performed by adopting an exponential forgetting algorithm [26]. Each background pixel is 
updated according to the following equation: 


Bij (tn) = A — X) Bij n1) + OL, (tn) (11.2) 


where 
B(ty-1) is the old background 
I (ty) is the last acquired frame 
oc is the learning rate (a € (0, 1)) 


The reported background update process for luminosity variation is performed only in the stable 
states of the system, empty and full, while it is avoided in the transition state, to fully control the 
once-off changes filtering procedure. In case of a change in the scene is detected, ROI partially 
occluded due to maneuvering or passing cars, and system move from empty/full to transition the 
exponential forgetting is not applied until the system moves into a stable state. When a transition 
in one of the two stable states is considered complete, the last acquired image is set as background 
and the exponential forgetting is enabled again. The background update policy as a function of the 
system state has been depicted in Figure 11.4a through c where the current state is identified by 
a colored area, light gray for the states in which the exponential forgetting is performed and dark 
gray otherwise, while a transition from a previous state is identified by a light gray arrow. 


11.3.4 Status Change Detection 


In the three states-based Markov chain adopted to describe the parking space behavior, and used 
to define the background modeling logic, a transition from one state to the other is achieved when 
a change in the scene is detected. In the proposed work, the change detection is based on a joint 
BS and FD approach. 

In all possible states of the system, both BS and FD are performed. The BS procedure is 
performed subtracting the last acquired frame to the background image and counting the difference 
image pixels (mgs) bigger than a THp threshold. In case ngs is bigger than a 7H gs threshold, 
a possible change in the system status is detected. The FD is based on the same logic, the last 
acquired frame is subtracted to the previous image frame and the number of the difference image 
pixels (mfp) is evaluated against a TH fp threshold. 

When the system is in one of the two stable states (i.e., empty and full), the condition 
ngs < TH gs confirms that the system is in a stable state and the background can be updated 
by Equation 11.2. In this case, an FD can be used to cross-check the results retrieved by BS: the 
condition mfp < TH fp confirms a lack of dynamics in the scene. If ngs > TH gs and nrp > TH rp 
a change in the system status is detected and the state of the system moves from empty/full to 
transition. The BS output can be seen as a system trigger, enforced by FD, moving from a stable 
state to the transition one. 
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(b) 


(c) 


Figure 11.4 Transitions due to car parking activities and spurious events. (a) From empty to full 
transitions, (b) from full to empty transition, and (c) spurious event transitions. 


Once the system is in the transition state, the FD is used as a main metric to move to a stable one. 
When afp > TH ep, moving objects are still detected (e.g., car maneuvering, people going down 
from the car, etc.) and no possible changes to stable states are considered. When np < TH gp is 
found for a number of frame bigger than a given TH y threshold, the system moves in a stable 
state decided according to mgs. If ngs < TH ps the new state is the same hold before the transition, 
otherwise a status change has occurred. The FD output can be seen as a system trigger able to move 
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Figure 11.5 Car parking dynamics at 1 fps. 


from the transition state to the stable ones: to this end the pp variable used to detect the event 
stability must be kept as low as possible to avoid inefficiencies. 

The FD enforcement adopted in the change detection logic avoids the use of computational 
expensive recognition techniques, even if it imposes the use of frame rates relatively high with respect 
to a car parking dynamic (e.g., 1 or 2 fps instead of one image every minute). The use of frame rates 
lower than 1 or 2 fps could result in a wrong synchronization between state and background thus 
giving a wrong output regarding the parking space occupancy status. This situation could happen 
when a black car exits from the parking space and a white car enters in the same. In case of an 
excessively high sample time, the change will be interpreted as a change in the system state, from 
full to empty (BS above threshold and FD lower than the threshold in the next frame) even if the 
parking spaces are still full. This is depicted in Figure 11.5; an acquisition time equal to 1 fps is 
enough to understand the car parking dynamics. 


11.3.5 Confidence Index 


In this section, we describe the logic for deciding whether the parking space is empty or full. The 
adopted metric is a confidence index (CI) since it describes the probability of a parking space to be 
full. The CI is evaluated as a function of the time and it is retrieved by the parking space occupancy 
algorithm. The index is evaluated at the end of each change detection evaluation and then quantized 
on a 8 bit value in order to reduce the packet payload in a wireless communication. In an application 
scenario in which several parking spaces are monitored, several CI must be sent together with other 
possible acquired data (e.g., temperature, light, and CO) level), the use of a tiny amount of bytes 
for each status notification allows to reduce the transceiver usage, thus saving energy. 

Due to the necessity of describing the parking space status according to the three states of the 
Markov chain, the CI values range has been divided in three main parts, as depicted in Figure 11.6, 
and each of them mapped in a possible state of the system. The range from the empty state goes 
from 0 to T,, while the full from Tp to 255, where T, and Tf are close to 0 and 255, respectively. 

Moving from empty to full, the CI increases as a broken line from 0 to 255 (as shown in 
Figure 11.7), following two different behaviors in the transition state. The change detection 
procedure splits the transition zone in two parts: a transition unstable zone and a transition stable 
zone. The transition unstable zone is close to the previous stable state and represents the period of 
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Figure 11.7 Effect of different types of transitions on Cl values. (a) From empty to full transition, 
(b) from full to empty transition, and (c) spurious event transitions. 


time dedicated to enter or leave the parking space. The transition stable zone, instead, represents 
the period of time between the end of the car maneuvering and the transition to the stable state. 


11.3.6 Parking Space Occupancy Detection Algorithm Pseudocode 


In this section, we report the algorithm pseudocode while explaining its components with respect 
to the logic described in the previous sections. The pseudocode reported in Algorithm 11.1 is 
applied to a single ROI covering a parking space in the scene. 
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Algorithm 11.1 Parking space monitoring algorithm pseudocode 


/*Tnitialization step*/ 
roi = get_ROI(); 
prev=roi; 
bgnd=roi; 
state = init_state; 
p_state = init_state; 
cont = 0; 
while 1 do 
/*Get a new image and perform BS and FD*/ 
roi = get_ROI(); 
n_bs = n_diff_over_th(roi, bgnd); 
n_fd = n_diff_over_th(roi, prev); 
if (state = FULL) or (state = EMPTY) then 
/*Full/empty state analysis*/ 
if (n_bs >TH_BS) and (n_fd >TH_FD) then 
p_state = state; 
state = TRANSITION; 
else 
p_state = state; 
state = state; 
bgnd = update_bgnd(roi, bgnd); 
end if 
else if (state = TRANSITION) then 
/*Transition state analysis*/ 
if (n_bs >TH_BS) then 
/*Real transition*/ 
if (n_fd >TH_FD) then 
/*Transition unstable zone*/ 
cont = 1; 
else 
/*Transition stable zone*/ 
if (cont >TH_N) then 
cont = 0; 
if (p_state = EMPTY) then 
p_state = TRANSITION; 


state = FULL; 
bgnd = roi_copy(roi); 
else 


p_state = TRANSITION; 
state = EMPTY; 
bgnd = roi_copy(roi); 
end if 
end if 
end if 
else 
/*Spurious event*/ 
state = p_state; 
p_state = TRANSITION; 
cont = 0; 
end if 
end if 
ci = compute_ci(state); 
prev = roi_copy(roi); 


end while 
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The first step in the algorithm is an initialization procedure in which the last acquired ROI 
is used as background and previous frame. At this stage the state of the system must be known 
and imposed to the algorithm. This initialization step corresponds to a situation in which the 
device is installed and manually configured for the utilization. While the algorithm is running, for 
each acquired ROI the BS and FD procedures are performed. When the previous state is a stable 
state, and the conditions ngs > TH gs and nep > TH fp occur, the state changes from stable to 
transition. In all the other cases, the state does not change and the background is updated by the 
exponential forgetting algorithm reported in Equation 11.2. 

When the state is equal to transition, the condition mgs < TH ps means that a spurious event has 
happened and the state is changed with the last stable one, instead ngs > TH gs confirms a possible 
status change. In this last case, FD is used to evaluate whether the system enters the transition 
stable zone: when this happens, the new state is set and the background updated. 


11.4 Algorithm Thresholds Tuning and Performance Evaluation 


The parking space status detection algorithm described in Section 11.3 permits to decide whether 
a parking space is full or empty as a function of several thresholds used for both BS and FD 
algorithms. In this section, we first discuss how to tune the thresholds to reduce possible incorrect 
decisions, then we show the algorithm performance in terms of sensitivity, specificity, execution 
time, and memory occupancy by means of a real implementation in a WMSN device. 


11.4.1 Algorithm Thresholds Tuning 


The effectiveness of the proposed algorithm can be seen as its ability in reflecting the real behavioral 
trend of monitored parking spaces. In terms of performance, the algorithm detection capabilities 
can be measured with respect to real ground-truth values evaluated by means of a human-based 
analysis, considering that better algorithm performance means detection outputs much more 
similar to the reference ground-truth. As a consequence, an effective algorithm thresholds tuning 
process must therefore select the best thresholds to reach detection performance consistent to the 
human ground-truth. To this end, starting from real images belonging to the IPERDS [27] dataset 
collected within the IPERMOB [28] project, we first evaluated the real ground-truth of tuning 
image sequences with a human-based frame-by-frame process, then we tuned all the algorithm 
thresholds to make it able to follow the real trend. 

As previously introduced, the image dataset adopted in the tuning process is the IPERDS, which 
is basically a collection of gray scale images acquired with a resolution of 160x120 pixels at 1 fps 
and related to traffic and parking spaces conditions. All the images composing the dataset have 
been collected by using a real WMSN device equipped with a low-cost camera, hence they have all 
the necessary characteristics to prototype video streaming and computer vision algorithms targeted 
to low-end devices. Among all the IPERDS traces related to parking space monitoring we selected 
one characterized by heavy shadows effects. The selected trace, in fact, can be considered the most 
challenging for the developed algorithm, because false change transitions can be detected in case of 
shadows in the selected ROI, causing in turn a wrong synchronization between the real status and 
the algorithm output. 

The real ground-truth of the IPERDS trace adopted to tune algorithm thresholds has been 
evaluated by a human operator with a frame-by-frame analysis. In order to have a human output 
in the same range of the algorithm (CI output range) the empty status has been notified with the 
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value 0, the transition with 127, and the full with 255. Two main rules have been imposed in 
evaluating the ground-truth: only parking cars can trigger status transitions, thus filtering moving 
people, and a transition ends when people inside the car leave the monitored ROI. A snapshot of 
the adopted tuning trace with the considered parking spaces is reported in Figure 11.8, while the 
time behavior ground-truth for each parking space is depicted in Figure 11.9. As it is possible to 
see from the plots, all the four parking spaces are characterized by status changes in the considered 
window time (more than 15 min) with shadows on neighboring parking spaces. Although the 
time-related ground-truth is enough to evaluate the algorithm thresholds, a secondary outcome of 
the performed analysis is the parking space usage trend model. In fact, considering the frequencies 
of each event it is possible to evaluate all the probabilities of the Markov chain introduced in 
Section 11.3.2. Table 11.2 reports the parking spaces probabilities for the selected tuning trace. 
Starting from a human-based ground-truth’ it is possible to tune all the algorithm thresholds 
by means of a comparison with the algorithm output. Although the thresholds introduced in 
Section 11.3.4 are four, only two of them must be properly tuned: THp and 7H y. The two 


Figure 11.8 Parking spaces considered in the tuning trace. 
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Figure 11.9 Ground-truth confidence index trend for the tuning trace. (a) P11, (b) P12, (c) P13, 
and (d) P14. 
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Table 11.2 Ground-Truth Transition Probabilities for the Tuning 


Trace 
Parking Space ID | Pe Pt Pf Pet Pr Pte Pif 
P11 0.779 | 0.019 | 0.198 | 0.002 | 0.000 | 0.001 | 0.001 
P12 0.657 | 0.008 | 0.331 | 0.002 | 0.000 | 0.001 | 0.001 
P13 0.367 | 0.001 | 0.630 | 0.001 | 0.000 | 0.000 | 0.001 
P14 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 


remaining thresholds, TH gs and TH gp, are dependent on THp, hence it is possible to set them 
equal to a portion of the ROI area while tuning TH p appropriately. The TH gs threshold has been 
imposed equal to 1/4 of the ROI area due to its requirements in detecting changes in the scene 
to trigger state transitions, while TH zp has been imposed equal to 1/8 of the ROI area due to 
its requirements in guarantee event stability. The THp and TH y have been jointly varied in the 
range from 50 to 60 and from 1 to 15, respectively, while evaluating the ground-truth similarity 
trend. In mathematical terms, the tuning procedure consists in finding from a set 3 of possible 
THp and TH y thresholds combinations the pair TH = (TH p, TH y) € 3, which minimizes 
the difference of the algorithm output from the human-based ground-truth. As similarity measure 
between algorithm output and ground-truth, we adopted the relation reported in the following: 


N iG. JA = Gia? 
s= [Şah cu ua 
k=1 


where 
Gg is the ground-truth value 
G is the algorithm output with a specific TH = (TH p, TH y) pair 
N is the total number of image frames 


S is an averaged Euclidean distance among CI outputs where lower values indicate a better similarity 
between the considered outputs. To thresholds tuning purposes, the similarity S has been calculated 
for all the four parking spaces selected in the tuning trace and then averaged among them in order 
to have an overall comparison value among TH = (THp, TH y) combinations. A graphical 
representation of the performed analysis is reported in Figure 11.10, where for three TH p values 
the similarity S as a function of TH y is shown. 

The performed similarity analysis shows that THp must be set larger than 55. Adopting the 
lowest selected value, 50, the problem pointed out at the beginning of this section occurs, so that 
the parking space P14 loses the state/background synchronization due to luminosity variations 
caused by shadows (Figure 11.11). This behavior is confirmed by higher values of S for TH p equal 
to 50 (Figure 11.10a). Regarding 7H y, a suitable value coming from the realized analysis is bigger 
than 1, even if bigger values can be adopted due to a possible increase in spurious transition filtering 
capabilities with no sensitive differences in similarity. As a consequence of the performed analysis 
results, we selected TH p and TH equal to 60 and 5, respectively, in order to better filter spurious 
transition and guarantee a correct state stabilization. Algorithm CI outputs with THp = 60 and 
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Figure 11.10 Similarity trend analysis. (a) THp = 50, (b)THp = 55, and (c) THp = 60. 
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Figure 11.11 Wrong state/background synchronization behavior in P14. 


TH y = 5 are depicted in Figure 11.12 for all considered parking spaces; to be noticed is the strong 
similarity with the human-based ground-truth shown in Figure 11.9. 

A validation process regarding the chosen thresholds values can be easily performed by evaluating 
the Markov chain transition probabilities coming out from the algorithm and comparing them with 
the one obtained by the human ground-truth analysis. The whole algorithm transition probabilities 
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Figure 11.12 Algorithm confidence index trend for the tuning trace (THp = 60, THy = 5). (a) 
P11, (b) P12, (c) P13, and (d) P14. 


Table 11.3 Algorithm Transition Probabilities for the Tuning Trace 
(THp = 60, THn = 5) 


Parking Space ID | Pe Pt Pf Pet Pr Pte Pif 

P11 0.780 | 0.005 | 0.211 | 0.002 | 0.000 | 0.001 | 0.001 
P12 0.656 | 0.005 | 0.333 | 0.003 | 0.000 | 0.002 | 0.001 
P13 0.366 | 0.005 | 0.623 | 0.002 | 0.001 | 0.001 | 0.002 
P14 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 


with the adopted thresholds values are reported in Table 11.3. Comparing such results with the one 
reported in Table 11.2 by means of the overall Euclidean distance between the vector of ground- 
truth probabilities and the one of the algorithm probabilities, the distances for the considered 
parking spaces are minimum: 0.019 for P11, 0.004 for P12, 0.007 for P13, and 0.000 for P14. 
Moreover, it must to be noticed that the differences among the probabilities in Tables 11.2 and 
11.3 are minimum for the stable states (P, and Pr), while the biggest differences are reached in the 
transition state (Pt) where the human ground-truth is substantially different from the algorithm 
output. 


11.4.2 Algorithm Occupancy Status Detection Performance 


The detection performance of the developed algorithm have been evaluated by means of simulations 
using an algorithm implementation suitable to run in real embedded devices. By adopting the 
threshold values selected in Section 11.4.1, the algorithm sensitivity and specificity [29] have been 
evaluated using IPERDS traces characterized by high movements in the scene with luminosity 
variations and regular shadows. 
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Figure 11.13 Parking spaces considered in a testing trace. 


The sensitivity of the algorithm has been evaluated as the number of true positive events over 
the sum of true positive and false negative events, as reported in Equation 11.4, and indicates 
the ability of the algorithm of correctly detecting full state events. Regarding the specificity, this 
performance parameter has been evaluated as the number of true negative events over the sum of 
true negative and false positive events, see Equation 11.4, and indicates the ability of the algorithm 
of correctly detecting empty state events. The performance of the developed algorithm with respect 
to these two metrics are 99.92% for the sensitivity and 95.59% for the specificity. As it is possible to 
see from the reported results, the proposed algorithm with a properly tuning process can correctly 
detect the status of a parking space both in full and empty conditions: 


TN 


P 
Sensitivity = TP + EN’ Specificity = TN + FP (11.4) 


Considering a testing trace in which four parking spaces are monitored (Figure 11.13), a 
graphical comparison analysis between the human-based ground-truth and algorithm output 
(Figures 11.14 versus 11.15) confirms the algorithm capabilities in detecting the effective parking 
spaces occupancy status. 


11.4.3 Algorithm Performance in the SEED-EYE Camera Network 


The algorithm performance in terms of execution time and memory occupancy has been eval- 
uated by means of a real implementation in the SEED-EYE [30] camera network device. The 
SEED-EYE board, depicted in Figure 11.16, is an innovative camera network device developed 
by Scuola Superiore Sant'Anna and Evidence within the IPERMOB [28] project. The board is 
equipped with a Microchip PIC32 microcontroller working at a frequency of 80 MHz and embed- 
ding 512 kB of Flash and 128 kB of RAM. The device mounts a CMOS camera that can be 
programmed to acquire images at various resolutions (up to 640x480) and frame rates (up to 
30 fps). As network interfaces, an IEEE802.15.4 compliant transceiver and an IEEE802.3 module 
have been installed in order to enable wireless communications among peer devices and possible 
connections to backhauling networks. The SEED-EYE board has been specifically designed to 
support high-demanding multimedia applications while requiring low power consumption during 
image acquisition and processing. Performance evaluation executed in laboratory has shown that 
the board can acquire and process 160 x 120 images at 30 fps while experiencing a maximum power 
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Figure 11.14 Ground-truth confidence index trend for a testing trace. (a) P21, (b) P22, (c) P23, 
and (d) P24. 
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Figure 11.15 Algorithm confidence index trend for a testing trace (THp = 60, THn = 5). (a) 
P21, (b) P22, (c) P23, and (d) P24. 


consumption equal to 450 mW when all the peripherals are activated. Lower power consumption 
values can be achieved reducing the image acquisition frame rates. 

To evaluate the algorithm execution time in the SEED-EYE camera network device, the 
algorithm implementation adopted in Section 11.4.2 has been ported as a custom application 
on the top of the Erika Enterprise (EE) [31,32] OS, an innovative real-time operating system 
for small microcontrollers that provides an easy and effective way for managing tasks. More in 
detail, Erika is a multiprocessor real-time operating system kernel, implementing a collection 
of application programming interfaces similar to those provided by the OSEK/VDX standard 
for automotive embedded controllers. The algorithm execution time on top of EE OS has been 
measured performing several execution runs while considering a single parking space. The whole 
performance evaluation results are presented in Figure 11.17 in terms of execution time distribution. 
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Figure 11.16 The SEED-EYE board developed within the IPERMOB project. 
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Figure 11.17 Execution time distribution. 


As it is possible to see from the plot, the algorithm execution time is not constant due to the priority- 
based scheduling policies adopted in EE OS giving higher priorities to basic operating system tasks. 
As overall result, the algorithm shows an average execution time of 1.37 ms with a standard 
deviation of 0.05 ms. 

Regarding the memory occupancy on both Flash and RAM, these values has been obtained 
by Microchip tools and are equal to 80.5 kB and 26.7 kB, respectively, for Flash and RAM. The 
percentage of total required memory is equal to 16.75%. 
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11.5 Conclusions 


In this chapter, we focus on the development of onboard image processing techniques for detecting 
the occupancy status of parking spaces. The developed techniques are presented as an effective 
solution for vehicle parking lot monitoring applications in the domain of ITS. Starting from the 
adoption of classical BS techniques, we propose a modeling background process specially designed 
for the considered application in order to follow a low-complexity approach. Moreover, in the 
chapter the process for appropriately tuning all the algorithm parameters is exhaustively presented 
starting from a human-based ground-truth behavioral comparison. In a real implementation on 
a camera network device, the developed algorithm can reach 99.92% sensitivity and 95.59% 
specificity in detecting the parking spaces occupancy status, while showing an average execution 
time of 1.37 ms with a memory occupancy of 80.5 kB in Flash and 26.7 kB in RAM. 
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12.1 Introduction 


12.1.1 Current Condition of Civil Infrastructure Systems 


It is widely accepted and acknowledged that the civil infrastructure systems (CIS) all around the 
world are aging and becoming more vulnerable to damage and deterioration. As an example, many 
of the 600,000 highway bridges existing today in the United States were constructed from 1950 to 
1970 for the interstate system. Having an approximately 50 year design life, most of these bridges 
are either approaching or have surpassed their intended design life (Figure 12.1). Highway agencies 
are struggling to keep up with the increasing demands on their highways, and deteriorating bridges 
are becoming severe choke points in the transportation network. It is estimated that more than 
25% of the bridges in the United States (~150,000) are either structurally deficient or functionally 
obsolete and that it will cost $1.6 trillion to eliminate all bridge deficiencies in the United States. 
In addition to highways and bridges, similar problems exist for other CIS such as buildings, energy 
systems, dams, levees, and water systems. Degradations, accidents, and failures indicate that there 
is an urgent need for complementary and effective methods for current assessment and evaluation 
of the CIS. At this point, structural health monitoring (SHM) offers a very promising tool for 
tracking and evaluating the condition and performance of different structures and systems by 
means of sensing and analysis of objective measurement data. 


12.1.2 Structural Health Monitoring of CIS 


SHM is the research area focusing on condition assessment of different types of structures including 
aerospace, mechanical, and civil structures. Though the earliest SHM applications were in aerospace 
engineering, mechanical and civil applications have gained momentum in the last few decades. 
Different definitions of SHM can be found in the engineering literature. For example, Aktan 
et al. (2000) defined SHM as follows: SHM is the measurement of the operating and loading 
environment and the critical responses of a structure to track and evaluate the symptoms of oper- 
ational incidents, anomalies, and/or deterioration or damage indicators that may affect operation, 
serviceability, or safety and reliability. Another definition was given by Farrar et al. (1999) and 
Sohn et al. (2001), where the researchers stated that SHM is a statistical pattern recognition pro- 
cess to implement a damage detection strategy for aerospace, civil, and mechanical engineering 


(a) 


Figure 12.1 Seymour Bridge in Cincinnati, Ohio, was constructed in 1953 and was decommis- 
sioned approximately after 50 years of service. General view (a) and the condition of the deck 
at the time of decommissioning (b). 
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(a) 


Figure 12.2 Sunrise Bridge, a bascule-type movable bridge in Ft. Lauderdale, Florida, was 
instrumented with more than 200 sensors for monitoring of structural, mechanical, and electrical 
components: bridge when opened (a) and instrumentation of the structural elements (b). 


infrastructure and it is composed of four portions: (1) operational evaluation; (2) data acquisition, 
fusion, and cleansing; (3) feature extraction; and (4) statistical model development. 

The starting point of an SHM system may be considered as the sensing and data acquisition 
step. The properties of the data acquisition system and the sensor network are usually application 
specific. The number and types of the sensors have a direct effect on the accuracy and the reliability 
of the monitoring process. With the recent technological developments in reduced cost sensing 
technologies, large amounts of data can be acquired easily with different types of sensors. The data 
collected during an SHM process generally include the response of the structure at different locations 
and information about the environmental and operational conditions. The measurements related 
to the structural response may include strain, acceleration, velocity, displacement, rotation, and 
others (Figure 12.2). On the other hand, data related to environmental and operational conditions 
may include temperature, humidity, wind speed, weigh in motion systems, and others. 

After collection, SHM data can be analyzed by means of various methodologies to obtain useful 
information about the structure and its performance. Unless effective data analysis methodologies 
are implemented to an SHM system, problems related to data management will be inevitable. This 
may not only cause an overwhelming situation for handling large amounts of data effectively but also 
cause missing critical information. In addition to the analysis of experimental data, interpretation 
might require modeling and simulation where the analytical and numerical results may be combined 
or compared with experimental findings. Finally, information extracted from the data is used for 
decision-making about the safety, reliability, maintenance, operation, and future performance of 
the structure. 


12.2 Data Analysis and Processing for SHM of CIS 


Although sometimes SHM is used (rather incorrectly) as a synonym to damage detection, it 
actually refers to a much broader research area that can be employed for different purposes such 
as validation of the properties of a new structure, long-term monitoring of an existing structure, 
structural control, and many others. Brownjohn et al. (2004) presents a good review of civil 
infrastructure SHM applications. On the other hand, it should also be emphasized that damage 
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(a) 


Figure 12.3 Brooklyn Bridge in New York City (a) was tested with 43 accelerometers to obtain 
the dynamic properties of the tower and deck: lateral bending mode of tower at 1.8262 Hz (b). 


detection is a very critical component of SHM. Identifying the presence of the damage might be 
considered as the first step to take preventive actions and to start the process toward understanding, 
the root causes of the problem. 

Various methodologies have been proposed for detecting damage using SHM data. For global 
condition assessment, most of these methodologies employ vibration data by using one or a 
combination of different time domain and/or frequency domain algorithms (Figure 12.3). The 
aim is to extract features that will be sensitive to the changes occurring in the structure and 
relatively insensitive to other interfering effects (e.g., noise and operational and environmental 
effects). Some of these methodologies can be found in literature and the references therein (Hogue 
et al. 1991; Toksoy and Aktan 1994; Doebling et al. 1996; Worden et al. 2000; Sohn and 
Farrar 2001; Bernal 2002; Chang et al. 2003; Kao and Hung 2003; Sohn et al. 2003; Giraldo and 
Dyke 2004; Lynch et al. 2004; Alvandi and Cremona 2005; Nair and Kiremidjian 2005; Catbas 
et al. 2006; Sanayei et al. 2006; Carden and Brownjohn 2008; Gul and Catbas 2008, 2009; Gul 
and Catbas 2011). 


12.2.1 Parametric Data Analysis Using Modal Models 


A considerable number of damage detection efforts focus on parametric methods that generally 
assume that the a priori model related to the physical characteristics of the system is known. The aim 
of such methods is usually to compute the unknown parameters of this model. These parameters 
are mostly related to physical quantities such as mass, damping, and stiffness of the system, and the 
change in these parameters is used for damage detection. 

Although a variety of parametric methods exist for damage detection applications, modal 
parameter estimation is one ofthe most commonly used parametric system identification approaches 
where the aim is to identify the unknown parameters of a modal model (modal frequencies, 
damping ratios, mode shape vectors, and modal scalings) of the system from given input-output 
or output-only data sets. One of the advantages of using the modal parameters is that they can 
easily be related to the physical characteristics of the structure. Therefore, a large body of research 
effort has been conducted investigating the modal parameter-based damage indices for SHM. 
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A very comprehensive and highly referenced review by Doebling et al. (1996) discusses and 
summarizes different methodologies to identify the damage by using the modal parameters and 
modal parameter-based damage indices. 

Modal parameter estimation research has yielded different methods and approaches espe- 
cially for mechanical and aerospace engineering applications in the last four to five decades. 
More recently, research for civil infrastructure applications using different forms of input-output 
or output-only dynamic tests has contributed new and/or revised algorithms, methods, and 
methodologies. These can be mainly grouped into two categories according to their working 
domain, i.e., time and frequency. There are several insightful review studies in the literature that 
would provide more detailed information about some of these methods (Maia and Silva 1997; 
Allemang and Brown 1998; Ewins 2000; Fu and He 2001; Peeters and Ventura 2003; Alvandi and 
Cremona 2005). 


12.2.1.1 Modal Parameter Identification in Time Domain 


Time domain modal parameter identification methods, as the name implies, extract the modal 
information from the time history data. These methods are generally developed from control 
theory concepts. The starting point of these methodologies is usually the free response or the 
pulse response of the system. However, most of these methodologies can be used with ambient 
vibration data after some preprocessing of the raw data to obtain an estimation of the free decay 
time response data. 

Although these methodologies are usually numerically stable and give satisfactory results, their 
application to heavily damped systems is limited since they require a large amount of time 
domain data. Some of the widely used methods include complex exponential algorithm (CEA), 
polyreference time domain (PTD) method, Ibrahim time domain (ITD) method, and Eigen- 
system realization algorithm (ERA). Detailed discussions about the time domain methodologies 
can be found in the literature including Maia and Silva (1997), Allemang and Brown (1998), 
Allemang (1999), and Peeters and Ventura (2003). Since ERA is one of the widely used time 
domain methods and it provides a generic framework for CEA and ITD, the following discussions 
give more details about this technique. ERA was developed by Juang and Pappa (1985) and is 
based on the minimal realizations to obtain a state space system with minimum orders to represent 
a given set of input output relations. 

A discrete time mth-order state space system with r inputs and m outputs can be written as 


x(k+1) = Apx (k) + Bpu(k) 


(12.1) 
y (k) = Cpx (k) + Dpu (k) 


where 
Ymx1 (£) is the output vector 
7x1 (£) is the input vector 
Xnxr (k) is the state vector 
ADnxm BDnxr» CD mxn> and Dp mxr are the time-invariant system matrices 


If the system is assumed to be excited with a unit impulse function and if the initial conditions of 
the system are zero as shown in Equation 12.2, then the response of the system can be calculated as 
written in Equation 12.3: 


272 m Intelligent Sensor Networks 


u(k) =1, x(k) =0 fork=0 
u(k) =0, x(k) =0 fork=0 


y (0) = Dp, y (1) = CpBp, y(2) = CpApBp, ..., y(k) = CpA% Bp (12.3) 


(12.2) 


The parameters shown in Equation 12.3 are known as Markov parameters. These parameters are 
collected in a so-called Hankel matrix, denoted with II, as in Equation 12.4: 


ID yQ2) + y() 
ID) IGO) > yp H+-1) 
MOy=] , . E 
Low JD: | 
CpBp CpApBp -::  CpAp'Bp 
CpApBp CpA%Bp ->  CpAhBp 
= . (12.4) 
Lala aa ie GA in] 


where 7 and j are the number of the columns and rows in the Hankel matrix, respectively. After 
building the Hankel matrix, the system matrices are retrieved by using singular value decomposition 
(SVD) of the Hankel matrix: 


T 
= ¿JU $ 0 Y 
II (0) = USV* =[U; U] | 0 0 | | vi | (12.5) 
where 
U and V are unitary matrices 
S is a square diagonal matrix 
The system matrices can be obtained by using the following: 
Ap = S "UTV SP? (12.6) 
Bp = SP VIE, (12.7) 
Cp = Ent St! (12.8) 


where E, = [Zx+ 0 ... 0] and Em = [mxm 0 ... 0]. The modal parameters can be extracted by 
using the system matrices obtained with earlier equations. The natural frequencies can be obtained 
directly from the eigenvalues of the system matrix Ap. The mode shapes can be computed by 
multiplying the corresponding eigenvectors of Ap with the output matrix Cp. 


12.2.1.2 Modal Parameter Identification in Frequency Domain 


Frequency domain methods transform the time histories to frequency domain and extract the 
modal parameters in the frequency domain. These methodologies use the frequency response 
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functions (FRFs) to compute the modal parameters. One of the main advantages of the frequency 
domain methodologies is that less computational modes (noise modes) are obtained in comparison 
with time domain algorithms. Some of the disadvantages of these methodologies are due to the 
restrictions of the fast Fourier transform (FFT). For example, leakage is one of the commonly 
encountered problems because FFT assumes that the signal is periodic within the observation time. 
The effect of leakage can be eliminated by using windowing functions but it cannot be avoided 
completely. 

One of the simplest frequency domain methods is peak picking method where the modes are 
selected from the peaks of the FRF plots. If the system is lightly damped and if the modes are well 
separated, the natural frequencies (eigenfrequencies) can be estimated from the FRF plots. The 
damping ratios can be determined by using the half-power method. Rational fraction polynomial 
is a high-order frequency domain methodology where the following formulation is used to identify 
the modal parameters. The coefficients of these polynomials can be estimated from the FRF 
measurements by using a linear least squares solutions. Then the modal parameters are computed 
by using the polynomial coefficients: 


[GwyB, + (jw)?! Bp- Parea Bo] 


12.9 
[GWI + (jwa ++. +00] ió 


A(jw) = 


where 
H is the FRF 
w is the frequency in radians 
x, B are the polynomial coefficients 


Another method called complex mode indicator function (CMIF) where the modal parameters are 
identified by using the SVD of the output spectrum matrix is described in detail in the following 
since it is the methodology used for the examples presented in this text. Shih et al. (1988a) initially 
introduced CMIF as a mode indicator function for MIMO data to determine the number of modes 
for modal parameter estimation. Then, CMIF was successfully used as a parameter estimation 
technique to identify the frequencies and unscaled mode shapes of idealized test specimens (Shih 
et al. 1988b; Fladung et al. 1997). Catbas et al. (1997, 2004) modified and further extended CMIF 
to identify all of the modal parameters including the modal scaling factors from MIMO test data. 
In these studies, it was shown that CMIF is able to identify physically meaningful modal parameters 
from the test data, even if some level of nonlinearity and time variance were observed. Figure 12.4 
shows the basic steps of the methodology. 

The first step of the CMIF method is to compute the SVD of FRF matrix, which is given in 
Equation 12.10: 


LH (wli, xn) = LUI x [Slavery (VI Gy (12.10) 


where 
[S] is singular value matrix 
[U] and [V] are left and right singular vectors, respectively 
[VE] indicates the conjugate transpose of [V] 


The earlier equation shows that the columns of the FRF matrix [H (w,)] are linear combination 
of the left singular vectors and, similarly, the rows of the FRF matrix are linear combination of the 
right singular vectors. Since the left and right singular vectors are unitary matrices, the amplitude 
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Figure 12.4 Summary of CMIF method. 


information of the FRF matrix is carried within the singular value matrix. The CMIF plot shows 
the singular values as a function of frequency. The number of singular values at a spectral line and 
therefore the number of lines in the CMIF plot depends on the number of excitation points, N; 
in Equation 12.10, assuming that N; is smaller than N,. For this reason, MIMO data CMIF plots 
indicate multiple lines enabling tracking of the actual physical modes of the structure. The FRF 
matrix can be expressed in a way different from Equation 12.10 in terms of modal expansion using 
individual real or complex modes as given in Equation 12.11: 


1 


— ln, (12.11) 
ls Sa 


M(Warxnv > | lavx2W | 


where 
w is frequency 
A, is rth complex eigenvalue or system pole 
[ ] is the mode shapes 
[L] is the participation vectors 


Note that while Equation 12.10 is a numerical decomposition, Equation 12.11 incorporates the 
physical characteristics, such as mode shapes and frequencies. As mentioned before, the left and 
right singular vector matrices [U] and [V] are unitary matrices in SVD formulation. Furthermore, 
[ ] and [L] are constant for a particular mode. The system pole and the driving frequency are 
closer along the frequency line near resonance, which results in a local maximum in CMIF plot. 
Therefore, there is a very high possibility that the peak singular values in CMIF plot are the pole 
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locations of the system. In addition, since the left singular vector is the response at the resonance, it 
is a good approximation of the modal vector at that frequency. Then these modal vectors are used 
as modal filters to condense the measured response to as many single-degree-of-freedom (SDOF) 
systems as the number of selected peaks. This, in fact, is a transformation of the response from 
physical coordinates to the modal coordinates. After the transformation is completed, enhanced 
frequency response functions (eFRFs) are calculated for each SDOF system. Then the entire system 
is uncoupled to a vector of SDOF system for mode m with Equation 12.12: 


eH (Wj) 7 = (UL [ ] | | [7] j {V} (12.12) 


jwi = A, 


The level of the enhancement depends on the inner product of the left singular vector and the 
modal vector | ]. If the modal vectors are mutually orthogonal, then the eFRF will be completely 
uncoupled, showing a single-mode FRF with a strong peak. However, if some of the modes are 
non-orthogonal, then those modes will have some contribution to the eFRF, which will cause 
another peak or peaks to appear. 

After obtaining the set of SDOF systems, the second part of the method is about determining 
the modal frequency, damping, and modal scaling for each separate mode. Since the system is now 
transformed to a set of SDOF systems using the eFRFs, the following equation can be written in 
the frequency domain to compute the system poles: 


[ (iw)? 097 + (jw) ou + ao | {eH (w;)} = [Gio Bo + (jw;) Bı + Bo| (RA (w;)} 
(12.13) 


In Equation 12.13, {R (w)} is the index vector showing the coordinates of the forcing locations and 
a and (3 are unknown coefficients. Since the eFRE matrix is generated in the first phase, eH (w) 
and (jw) are now known quantities. If there is no noise or residual terms in the data sets, just 
Bo is sufficient, but to handle the noise fi and (2 can be added to the right-hand side of the 
equation to enhance the results (Catbas et al. 1997; Fladung et al. 1997). If either y or «2 is 
assumed to be unity, a least-square solution can be applied and then the eigenvalue problem can be 
formulated and solved for the poles of the SDOF system. The poles (A, =0,+ jwr) of the system 
are determined on a mode-by-mode basis. 


12.2.1.3 Ambient Vibration Data Analysis 


As mentioned earlier, the ideal case for modal parameter identification is one where both the 
input and output data are available. However, for most of the real-life applications for CIS, such 
experimental setups cannot be implemented since it is generally neither feasible nor possible to 
excite the large constructed structures with a known input especially if the structure is tested during 
usual operation particularly in the case of existing bridges. Therefore, identification of modal 
parameters from ambient vibration test data has attracted attention in the last decades, and several 
studies are available in the literature (Beck et al. 1994; Brincker et al. 2000; Peeters and De Roeck 
2001; Brownjohn 2003; Caicedo et al. 2004; Yang et al. 2004; Catbas et al. 2007; Gul and 
Catbas 2008). 

Several methods have been proposed to identify the modal parameters of the structures with 
output-only data. These methods are usually based on the methods discussed in previous sections of 
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this text where data are preprocessed with different methodologies to obtain output spectra, unscaled 
free responses, or unscaled FRFs. For example, peak picking method, which was mentioned in 
the previous sections, can be used for ambient data analysis. However, for output-only analysis, 
the auto- and cross-power spectral densities of the ambient outputs are used instead of FRFs (Ren 
et al. 2004). Another method called frequency domain decomposition has also been used for 
ambient analysis by using SVD of the output spectrum matrix (Brincker et al. 2000; Peeters and 
De Roeck 2001). This method is also referred as CMIF (Catbas et al. 1997; Peeters and De Roeck 
2001). 

There are a number of different approaches for ambient vibration data analysis. One example 
of time domain methods for ambient data analysis is the ITD used in conjunction with random 
decrement (RD) (Huang et al. 1999). In another approach, Caicedo et al. (2004) combined the 
natural excitation technique with ERA to identify the modal and stiffness parameters. Stochastic 
subspace identification (SSI) is another commonly used method (Van Overschee and DeMoor 1996; 
Peeters and De Roeck 2001; Ren et al. 2004), which is based on writing the first-order state space 
equations for a system by using two random terms, i.e., process noise and measurement noise, 
which are assumed to be zero mean and white. After writing the first-order equations, the state 
space matrices are identified by using SVD. Then, the modal parameters are extracted from the 
state space matrices (Peeters and De Roeck 2001). Details and examples for ambient vibration 
data analysis are not presented in this text for the sake of brevity. 


12.2.1.4 Modal Parameters and Damage Detection 


After obtaining the modal parameters, these parameters or their derivatives can be used as damage- 
sensitive metrics. Some of the common modal-parameter-based features may be summarized as the 
natural frequencies, mode shapes and their derivatives, modal flexibility matrix, modal curvature, 
and others. It has been shown that natural frequencies are sensitive to environmental conditions, 
especially to temperature changes, yet they are not sufficiently sensitive to damage. In addition, 
since damage is a local phenomenon most of the time, the lower-frequency modes are usually not 
affected by the damage. The higher-frequency modes may indicate the existence of the damage 
because they generally represent local behavior but it is more difficult to identify those modes 
compared to identification of lower-frequency modes. 

Unlike natural frequencies, which do not usually provide any spatial information, mode shapes 
provide such information and thus they are generally a better indicator of damage than natural 
frequencies. In theory, mode shapes would indicate the location of the damage; however, a dense 
array of sensors may be needed to capture those modes. Modal assurance criterion is one of the 
commonly used modal vector comparison tool (Allemang and Brown 1982). Modal curvature, 
which is usually obtained by taking derivative of the mode shapes, has also been used for damage 
detection purposes. Modal flexibility is another damage indicator, which can be obtained by using 
the frequencies and mass-normalized mode shapes. A review of these damage features was given by 
Doebling et al. (1996) and Carden and Fanning (2004). For the examples presented here, modal 
flexibility and modal curvature are obtained from the MIMO data sets with CMIF as summarized 
in Figure 12.5. The details and formulations of these damage features are explained later. 

First developed by Maxwell (1864), the flexibility is a displacement influence coefficient of 
which the inverse is stiffness. Flexibility is a significant index as it characterizes input-output 
relationship for a structure. It has been shown to be a robust and conceptual condition index for 
constructed facilities. To find the modal flexibility of a structure, one can use modal parameters 
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Figure 12.5 Obtaining the modal flexibility and modal curvature. 


from dynamic testing. Flexibility has been proposed and shown as a reliable signature reflecting the 
existing condition ofa bridge. For this reason, flexibility-based methods in bridge health monitoring 
are promising. Flexibility has been extracted and used in a number of different ways, and further 
studies can be found in Toksoy and Aktan (1994), Catbas and Aktan (2002), Bernal (2002), Bernal 
and Gunes (2004), Alvandi and Cremona (2005), Huth et al. (2005), Catbas et al. (2004, 2006, 
2008a), and Gao and Spencer (2006). If an approximation to real structural flexibility is needed, 
the input force must be known in order to obtain the scaling of the matrix. In addition, it is 
always an approximate index since not all the modes can ever be included in the calculation of the 
flexibility matrix (Catbas et al. 2006). 

The derivation of the modal flexibility can be better understood when looked at the FRF between 
points p and q written in partial fraction form as in Equation 12.14: 


Hg (w) = Y a £ be (12.14) 


r=1 


where 
Hpq (w) is FRF at point p due to input at point q 
w is frequency 
A, is rth complex eigenvalue or system pole 
(424), is residue for mode r 
() indicates the complex conjugate 


Equation 12.14 can be rewritten by using the modal parameters as in Equation 12.15: 


_ 2 WoW gr j vs, pă 
= Liz (0-2) * Mi, (jw — 9) 


r=1 


(12.15) 


where 
Y py is the mode shape coefficient between point p and q for the rth mode 
MA, is the modal scaling for the rth mode 


Finally, the modal flexibility matrix can be computed by evaluating H,,(w) at w = 0 as in 
Equation 12.16: 


= L Z War W gr | >, ae 
A | Mar) AO (12.16) 
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The flexibility formulation is an approximation to actual flexibility matrix because only a finite 
number of modes can be included in the calculations. The number of modes, m, is to be deter- 
mined such that sufficient modes are selected (i.e., temporal truncation is minimized) and a good 
approximation to actual flexibility is achieved. Catbas et al. (2006) used a modal convergence 
criterion to determine the number of the modes necessary to construct a reliable flexibility matrix. 
In the examples presented here, around 15 modes are used to construct the flexibility matrices for 
each case. The modal flexibility matrix can also be written as in Equation 12.17. After obtaining 
the scaled modal flexibilities, the deflections under static loading can be calculated easily for any 
given loading vector {P}, which is shown in Equation 12.18: 


Hi(w=0) Ay2(w=0) --: Ain (w=0) 

aay ee (12.17) 
Hy (w = 0) oes --- Hyn (w = 0) 

[deflection] = {v} = [H] {P} (12.18) 


As another damage index, curvatures of modal vectors have been presented in the literature (Pandey 
et al. 1991; Maeck and De Roeck 1999, 2003). In this text, the deflections and curvatures are 
created from the modal flexibility. These indices are implemented here in terms of displacement 
vectors resulting from uniform loads applied to the modal flexibility matrices. However, the 
limitations of the curvature method are to be recognized. First, the spatial resolution (i-e., number 
of sensors) should be sufficient to describe a deflection pattern along a girder line. In order to obtain 
a good approximation to actual flexibility, both dynamic inputs and outputs are to be measured. 
In addition, modal truncation is to be minimized since modal flexibility has to approximate actual 
flexibility. Finally, taking derivatives of the data that include random noise and experimental 
errors might create numerical errors (Chapra and Canale 2002). However, the derivation presented 
in this text is based on the combination of all modes and associated deflections. Therefore, the 
random numerical errors are averaged out and may have less effect than taking the derivative of 
a single-mode shape. 

As given in mechanics theory, curvature and deflection are related for a beam type by 
Equation 12.19: 


p= = = (12.19) 


where 
v is the curvature at a section 
M is the bending moment 
E is the modulus of elasticity 
T is the moment of inertia 


Since curvature is a function of stiffness, any reduction in stiffness due to damage should be 
observed by an increase in curvature at a particular location. The basic assumption for applying this 
relation to bridges is that the deformation is a beam-type deformation along the measurement line. 
This assumption can yield reasonably good approximation for bridges with girder lines. However, 
if it is to be used for damage identification for a two-way plate-type structure, the curvature 
formulation should be modified to take two directions into account. To calculate the curvature of 
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the displacement vectors, the central difference approximation is used for numerical derivation as 
in Equation 12.20: 


__ Da-i — 20,5 + Vai 
vpi = Le (12.20) 


where 
q represents the elements of the th displacement vector 
Ax is the length between measured displacement points 


12.2.2 Nonparametric Data Analysis Using Time Series Analysis 


Use of statistical pattern recognition methods offers promise for handling large amounts of data. 
Most of the studies focusing on statistical pattern recognition applications on SHM use a com- 
bination of time series modeling with a statistical novelty detection methodology (e.g., outlier 
detection). One of the main advantages of such methodologies is that they require only the data 
from the undamaged structure in the training phase (i.e., unsupervised learning) as opposed to 
supervised learning where data from both undamaged and damaged conditions are required to 
train the model. The premise of the statistical pattern recognition approach is that as the model is 
trained for the baseline case, new data from the damaged structure will likely be classified as outliers 
in the data. 

Most of these statistical models are used to identify the novelty in the data by analyzing the 
feature vectors, which include the damage-sensitive features. For example, Sohn et al. (2000) used 
a statistical process control technique for damage detection. Coefficients of auto-regressive (AR) 
models were used as damage-sensitive features and they were analyzed by using X-bar control 
charts. Different levels of damage in a concrete column were identified by using the methodology. 
Worden et al. (2000) and Sohn et al. (2001) used Mahalanobis distance-based outlier detection for 
identifying structural changes in numerical models and in different structures. Worden et al. (2000) 
used transmissibility function as damage-sensitive features whereas Sohn et al. (2001) used the 
coefficients of the AR models. Manson et al. (2003) also used similar methodologies to analyze 
data from different test specimens including aerospace structures. 

In another study, Omenzetter and Brownjohn (2006) used auto-regressive integrated moving 
average (ARIMA) models to analyze the static strain data from a bridge during its construction 
and when the bridge was in service. The authors were able to detect different structural changes by 
using the methodology. They also mentioned the limitations of the methodology, for example, it 
was unable to detect the nature, severity, and location of the structural change. Nair et al. (2006) 
used an auto-regressive moving average (ARMA) model and used the first three AR components as 
the damage-sensitive features. The mean values of the damage-sensitive features were tested using a 
hypothesis test involving the t-test. Furthermore, the authors introduced two damage localization 
indices using the AR coefficients. They tested the methodology using numerical and experimental 
results of the ASCE benchmark structure. It was shown that the methodology was able to detect 
and locate even different types of damage scenarios for numerical case. However, it was concluded 
by the authors that more investigations were needed for analysis of experimental data. 

Another methodology was proposed by Zhang (2007), where the author used a combination 
of AR and ARX (auto-regressive model with eXogenous output) models for damage detection and 
localization. The standard deviation of the residuals of the ARX model was used as damage-sensitive 
feature. Although the methodology was verified by using a numerical model, the author indicated 
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that further studies should be conducted to make the methodology applicable in practice. In a 
recent study, Carden and Brownjohn (2008) used ARMA models and a statistical pattern classifier, 
which uses the sum of the squares of the residuals of the ARMA model. The authors stated that 
the algorithm was generally successful in identifying the damage and separating different damage 
cases from each other. However, the authors noted that the vibration data were coming from 
forced excitation tests and the methodology may not be applicable for structures with only ambient 
dynamic excitation. 


12.2.2.1 Review and Formulations of Time Series Modeling 


Time series modeling (or analysis) is a statistical modeling of a sequence of data points that are 
observed in time. It has been used in many different fields including structural dynamics and system 
identification. In the following sections, a brief discussion about time series modeling is given. 
A more detailed discussion about the theory of the time series modeling is beyond the scope of this 
text and can be found in the literature (Pandit and Wu 1993; Box et al. 1994; Ljung 1999). 

A linear time series model representing the relationship of the input, output, and the error terms 
of a system can be written with the difference equation shown in Equation 12.21 (Ljung 1999). 
A compact form of this equation is shown in Equation 12.22: 


y(t) + ay (t DĂ + ayy (t — na) 
= byu(t— Dr: bau e — mp) + et) + die(t—1) +--+ dyje(t— nd) (12.21) 
Ay) = Bgu) + Dele) (12.22) 
where 


y(t) is the output of the model 
u(t) is the input to the model 
e(t) is the error term 


The unknown parameters of the model are shown with a;, b;, and d; and the model orders are 
shown with 74, nb, and ng. A(q), B(q), and D(g) in Equation 12.22 are polynomials in the delay 
operator g! as shown later in Equation 12.23. The model shown in Equation 12.22 can also be 
referred as an ARMAX model (auto-regressive moving average model with eXogenous input), and 
a block diagram of an ARMAX model can be shown as in Figure 12.6. 


AQ) =1+ aig +03 *+- +4, q " 
B) > bhig + bg +++ + bag (12429) 
D) = + dq +h? +-+ dng 


By changing the model orders of an ARMAX model, different types of time series models can be 
created. For example, if np = ng =0, the model is referred as an AR model, whereas an ARMA 
model is obtained by setting 7; to zero. The structure of an AR model is shown in Equation 12.24 
whereas the block diagram of the model is shown in Figure 12.7. 


A(q)y(t) = B(q)u(t) + elo) (12.24) 


Estimating the unknown parameters of a time series model from the input-output data set (ie., 
system identification) is of importance since the identified model can be used for many different 
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Figure 12.6 Block diagram of an ARMAX model. (Adapted from Ljung, L., System Identification: 
Theory for the User, Prentice-Hall, Upper Saddle River, NJ, 1999.) 
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Figure 12.7 Block diagram of an AR model. (Adapted from Ljung, L., System Identification: 
Theory for the User, Prentice-Hall, Upper Saddle River, NJ, 1999.) 


purposes including prediction and novelty detection. To estimate the unknown parameters, the 
difference equation of an ARX model can be written as in Equation 12.25: 


y(t) = —ayy (t — 1) — +++ — ayy (t na) + u(t — 1) +--+ + banu (t — ny) + ele) 
(12.25) 


Equation 12.25 can be written for the previous time step as in Equation 12.26: 


y(t) = —aiy (t — 1) — +++ — ayy t — na — 1) 


+ bu (t — 2) +--+ bou (t — mp — 1) Fe(t-1) (12.26) 


Considering that these equations can be written for each time step, the equations can be put in a 
matrix form as in Equation 12.27: 


Y = X0 +E (12.27) 
where 


Y =[p@) y@—1) ---y(@-—2 DD] (12.28) 
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= —y (t — 2) e y(t— mg — 1) u(t — 2) vee —u(t—n,—1) 
E ee y (e en n) sd n) ie eee 
(12.29) 
O=[a e a, bo ba] (12.30) 
E=[et) e(e-1) = e¢—n+1)]/ (12.31) 


where n is the number of the equations. It is observed that Equation 12.27 is a linear matrix 
equation and the vector @ containing the unknown parameters can be estimated by using linear 
regression as shown in Equation 12.32. This solution also guarantees that the error vector E is 
minimized. 


o (XTX) ` XTY (12.32) 


12.2.2.2 Time Series Modeling in Conjunction with Novelty Detection 


This section demonstrates the implementation of the time series modeling for novelty detection for 
CIS. The RD method (not presented here, details can be found in Cole [1968], Asmussen [1997], 
and Gul and Catbas [2009]) is used to normalize the data and obtain the pseudo free responses 
from the ambient data. By doing so, the effect of the operational loadings on the data is minimized. 
Therefore, different data sets from different operational conditions can be compared more reliably. 
The methodology is illustrated in Figure 12.8. 
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Figure 12.8 Implementing RD to ambient acceleration data before obtaining the feature sets 
using time series modeling. 
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After obtaining the pseudo free response functions, AR models of these free responses are created. 
Although a more detailed discussion about time series modeling was given in the previous sections, 
a brief discussion about AR models is given here. An AR model estimates the value of a function 
at time £ based on a linear combination of its prior values. The model order (generally shown 
with p) determines the number of past values used to estimate the value at ¢ (Box et al. 1994). The 
basic formulation of a pth-order AR model is defined in Equation 12.33: 


È 
x(t) = X ojx (t —jAt) + ele) (12.33) 
j=l 


where 
x(t) is the time signal 
¢— is the model coefficients 
e(t) is the error term 


After obtaining the coefficients of the AR models, they are fed to the outlier detection algorithm, 
where the Mahalanobis distance between the two different data sets is calculated. 

After obtaining the AR model coefficients for different data sets, these coefficients are now used 
for outlier detection. Outlier detection can be considered as the detection of clusters, which deviate 
from other clusters so that they are assumed to be generated by another system or mechanism. 
Outlier detection is one of the most common pattern recognition concepts among those applied 
to SHM problems. In this section, Mahalanobis distance-based outlier detection is used to detect 
the novelty in the data. 

The outlier detection problem for univariate (1D) data is relatively straightforward (e.g., the 
outliers can be identified from the tails of the distribution). There are several discordance tests, but 
one of the most common is based on deviation statistics and it is given by Equation 12.34: 


did 


oO 


Zi (12.34) 
where 

z; is the outlier index for univariate data 

d; is the potential outlier 

d and o are the sample mean and standard deviation, respectively 


The multivariate equivalent of this discordance test for n x p (n is the number of the feature vectors 
and p is the dimension of each vector) data set is known as the Mahalanobis squared distance 
(Mahalanobis 1936). The Mahalanobis squared distance will be referred as Mahalanobis distance 
after this point and it is given by Equation 12.35: 


=1 


Zi= (4-0 —x) (12.35) 


where 
Z; is the outlier index for multivariate data 
x; is the potential outlier vector 
x is the sample mean vector 
X is the sample covariance matrix 
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By using the earlier equations, the outliers can be detected if the Mahalanobis distance of a data 
vector is higher than a preset threshold level. Determining this threshold value is very critical, and 
different frameworks such as the one presented by Worden et al. (2000) can be used. 


12.2.2.3 Damage Detection with Time Series Modeling 
Using Sensor Clustering 


In this section, a modified time series modeling for damage detection and localization will be 
described. As the starting point, the equation of motion for an N degrees of freedom (DOF) linear 
dynamic system can be written as in Equation 12.36: 


Mx(t) + Cx(t) + Kx(t) = f(t) (12.36) 


where 
M e RYN is the mass matrix 
C e RY is the damping matrix 
K e RAN is the stiffness matrix 


The vectors x(£), x(t), and x(¢) are acceleration, velocity, and displacement, respectively. The 
external forcing function on the system is denoted with f(t). The same equation can be written in 
matrix form as shown in Equation 12.37 (¢ for time is omitted): 


| mil +++ min XI Cy. est, AN x] 
tj i 
| MN MNN XN CNI *** CNN XN 
ka kin x1 Á 
at EN : : = : (12.37) 
kyi +++ NN XN ÍN 


The equality in Equation 12.38 is obtained if the first row of Equation 12.37 is written separately. 
By rearranging Equation 12.38, it is seen in Equation 12.39 that the output of the first DOF can 
be written in terms of the excitation force on first DOF, the physical parameters of the structure, 
and the outputs of the other DOFs (including itself). Furthermore, in case of free response, the 
force term can be eliminated and the relation is written as shown by Equation 12.40: 


(mă + +++ + mMinXpy) + (c11 + +++ + cină) + (kixi +++ INN) = fi (12.38) 


y _ fi (mad + ++ + mney) — Cue +++ been) — (kuxi +++ + kinan) 
= 
mıı 
(12.39) 
. (my2X%2 + +++ + MinXn) + (c11%1 +++ + ere) + (kixi + +++ + EINXN) 
x = = a) 


mıı 


It is seen from Equation 12.40 that if a model is created to predict the output of the first DOF by 
using the DOFs connected to it (neighbor DOFs), the change in this model can reveal important 
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information about the change in the properties of that part of the system. Obviously, similar 
equalities can be written for each row of Equation 12.37, and different models can be created for 
each equation. Each row of Equation 12.37 can be considered as a sensor cluster with a reference 
DOF and its neighbor DOFs. The reference DOF for Equation 12.40, for example, is the first 
DOF, and neighbor DOFs are the DOFs that are directly connected to the first DOF. Therefore, 
it is observed that different linear time series models can be created to establish different models for 
each sensor cluster, and changes in these models can point the existence, location, and severity of 
the damage. The details of the methodology are explained in the following sections. 

As explained in the previous chapter, a general form of a time series model can be written as in 
Equation 12.41, and an ARX model is shown in Equation 12.42: 


ADIE) = Bult) + Delt) (12.41) 
A(q)y(t) = B(q)u(t) + elt) (12.42) 


The core of the methodology presented in this part is to create different ARX models for different 
sensor clusters and then extract damage-sensitive features from these models to detect the damage. 
In these ARX models, the y(¢) term is the acceleration response of the reference channel of a sensor 
cluster, the u(t) term is defined with the acceleration responses of all the DOES in the same cluster, 
while e(£) is the error term. Equation 12.43 shows an example ARX model to estimate the first 
DOF's output by using the other DOFs’ outputs for a sensor cluster with & sensors: 


AQ) = BP 2) -- He) y + els). (12.43) 


To explain the methodology schematically, a simple three-DOF model is used as an example. 
Figure 12.9 shows the first sensor cluster for the first reference channel. The cluster includes first 
and second DOFs since the reference channel is connected only to the second DOF. The input 
vector u of the ARX model contains the acceleration outputs of first and second DOFs. The output 
of the first DOF is used as the output of the ARX model as shown in the figure. When the second 
channel is the reference channel, Figure 12.10, the sensor cluster includes all three DOFs since they 
are all connected to the second DOF. The outputs of the first, second, and third DOFs are used as 
the input to the ARX model and then the output of the second DOF is used as the output of this 
model. Likewise, for the reference channel three, Figure 12.11, the inputs to the ARX model are 
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Figure 12.9 Creating different ARX models for each sensor cluster (first sensor cluster). 
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Figure 12.10 Creating different ARX models for each sensor cluster (second sensor cluster). 
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Figure 12.11 Creating different ARX models for each sensor cluster (third sensor cluster). 


the output of the second and third channels and the output of the model is the third channel itself. 
The equations of the ARX models created for the example system are shown in Equations 12.44 
through 12.46: 


Au =B [40 0] +a (12.44) 
Age) =B [Ho Bo Bn] +a (12.45) 
ADEO = BAL RO 0] + eto) (12.46) 


After creating the ARX models for the baseline condition, different approaches may be implemented 
for detecting damage. For example, comparison of the coefficients of the ARX models for each 
sensor cluster before and after damage can give information about the existence, location, and 
severity of the damage. For the approach adapted here, the fit ratios (FRs) of the baseline ARX 
model when used with new data are employed as a damage-sensitive feature. The difference between 
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the FRs of the models is used as the damaged feature (DF) (Gul and Catbas 2011). The FR of an 
ARX model is calculated as given in Equation 12.47: 


Fit Ratio (FR) = (|: = Ii) x 100 (12.47) 
y 


where 
{ y} is the measured output 
{ 5) is the predicted output 
{y} is the mean of {y} 
| {y} — 5) | is the norm of {y} = {5} 


The DF is calculated by using the difference between the FRs for healthy and damaged cases as 
given in Equation 12.48: 


FRhealthy — FRdamaged 


Damage Feature (DF) = ER 
healthy 


x 100 (12.48) 


12.3 Laboratory Studies 
12.3.1 Steel Grid Structure 


Before the routine applications of SHM systems to real-life structures, the methodologies should be 
verified on analytical and physical models. Although analytical studies are necessary in the first phases 
of verification, laboratory studies with complex structures are also essential. Laboratory studies with 
large physical models are a vital link between the theoretical work and field applications if these 
models are designed to represent real structures where various types and levels of uncertainties can 
be incorporated. 

For this section, data from a steel grid structure is employed for the experimental verifications 
of the methods discussed in this text. This model is a multipurpose specimen enabling researchers 
to try different technologies, sensors, algorithms, and methodologies before real-life applications. 
The physical model has two clear spans with continuous beams across the middle supports. It has 
two 18 ft girders (S3 x 5.7 steel section) in the longitudinal direction. The 3 ft transverse beam 
members are used for lateral stability. The grid is supported by 42 in. tall columns (W12 x 26 
steel section). The grid is shown in Figure 12.12 and more information about the specimen can be 
found in Catbas et al. (2008b). 

A very important characteristic of the grid structure is that it can be easily modified for different 
test setups. For example, with specially designed connections (Figure 12.13), various damage cases 
can be simulated. In addition, several different boundary conditions and damage cases (e.g., pin 
supports, rollers, fixed support, and semi-fixed support) can be simulated by using the adjustable 
connections. 

The grid structure can be instrumented with a number of sensors for dynamic and static 
tests. For the dynamic tests that are in the scope of this text, the grid is instrumented with 12 
accelerometers in vertical direction at each node (all the nodes except N7-N 14 in Figure 12.14). 
The accelerometers used for the experiments are ICP/seismic-type accelerometers (Figure 12.15) 
with a 1000 mV/g sensitivity, 0.01-1200 Hz frequency range, and 72.5 g of measurement range. 
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Figure 12.13 CAD drawing and representative pictures showing the details of the grid. 


To record the dynamic response, an acquisition system from VXI and Agilent Technologies is used. 
The MTS-Test software package was used for acquisition control of the impact tests. 

For the impact tests, the grid was excited at nodes N2, N5, N6, and N12 and five averages were 
used to obtain the FRFs as it is suggested in the literature. The sampling frequency is 400 Hz. For 
the impact tests, an exponential window is applied to both input and output data sets whereas a 
force window is applied only to the input set. Both time history and FRF data from MTS software 
were recorded. The ambient vibration was created by random tapping of two researchers with 
fingertips simultaneously. The researchers were continuously moving around the structure to make 
the excitation as random as possible. The ambient data were recorded by using VXI DAQ Express 
software. 
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N14 


Figure 12.14 Node numbers for the steel grid. 


Figure 12.15 Accelerometers used in the experiments. 


12.3.1.1 Damage Simulations 


A number of different damage scenarios were applied to the grid. These damage cases are simulated 
to represent some of the problems commonly observed by bridge engineers and Department of 
Transportation officials (Burkett 2005). Two different damage cases investigated here are summa- 
rized in the following text. One of these damage cases is devoted to local stiffness loss whereas the 
other case is simulated for boundary condition change. 

Baseline case (BCO): Before applying any damage, the structure is tested to generate the baseline 
data so that the data from the unknown state can be compared to the baseline data for damage 
detection. 

Damage case 1 (DCI): Moment release and plate removal at N3: DCL simulates a local stiffness 
loss. The bottom and top gusset plates at node N3 are removed in addition to all bolts at the 
connection (Figure 12.16). This is an important damage case especially for CIS applications since 
gusset plates are very critical parts of steel structures. Furthermore, it has been argued that inadequate 
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Figure 12.16 Plate removal at N3 for DC1. 


gusset plates might have contributed to the failure of the 1-35W Mississippi River Bridge (Holt 
and Hartmann 2008). 

Damage case 2 (DC2): Boundary restraint at N7 and N14: DC2 is created to simulate some 
unintended rigidity at a support caused by different reasons such as corrosion. The oversized 
through-bolts were used at N7 and N14 to introduce fixity at these two supports (Figure 12.17). 
Although these bolts can create considerable fixity at the supports, it should be noted that these 
bolts cannot guarantee a perfect fixity. 


12.3.2 Damage Detection Results Using Parametric Methods 


Generally, the first step in damage detection is to define the baseline state. Therefore, the damage 
features are first evaluated for the healthy case. The modal parameters of the healthy structure 
are identified by using CMIF that was outlined in the previous sections. Sample data, FRFs and 
the CMIF plot for BCO are shown in Figures 12.18 through 12.20. There are 17 vertical modes 
identified for this case and the first 10 vertical modes are shown in Figure 12.21. After the modal 
parameters were identified, the modal flexibility was calculated. The deflection profile obtained 
with modal flexibility is shown in Figure 12.22. The deflected shape is obtained by applying a 100 
lb uniform load to the measurement locations (i.e., 100 lb at each node). 

After obtaining the deflections, the curvature is obtained by using the deflected shapes as 
shown in Equation 12.20. The modal curvature of the baseline case is shown in Figure 12.23. It 
should be noted that the spatial resolution of the sensors would affect the quality of the curvature 
data considerably. A denser sensor array would further improve the results. However, the sensor 
spatial resolution in this study is defined so that it represents a feasible sensor distribution for real- 
life applications on short- and medium-span bridges. Another consideration for modal curvature 
calculation is that there is no curvature value at the beginning and end measurement locations due 
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Figure 12.17 Boundary fixity at N7 and N14 for DC2. 
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Figure 12.18 Sample data from impact testing for BCO. 


to the numerical approximation of the central difference formula. The curvatures at the supports are 
assumed to be zero since the roller supports cannot resist moment thus indicating that there cannot 
be any curvature at these points. Finally, the curvature plot is very similar to that of a moment 
diagram (M/ET) for a girder under uniform load. As such, the interpretation and evaluation of the 
curvature plot become very intuitive for structural engineers as in the case of deflected shapes. 
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Figure 12.19 Sample FRF data obtained from impact testing for BCO. 
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Figure 12.20 CMIF plot for BCO obtained with impact testing. 


Damage case I (DCI): Moment release and plate removal at N3: When the same procedure is 
repeated for DC1, it is observed in Figure 12.24 that the maximum deflection change obtained 
is at the damage location (N3) and is around 2.8%. The changes at the other nodes are around 
1%-2%. Although this change may possibly be considered as an indicator of the damage occurred 
at N3 for this laboratory case, it should also be noted that a 3% change in the flexibility coefficients 
might not be a clear indicator of the damage, especially for real-life applications. Looking at the 
curvature comparisons, Figure 12.25 shows that the maximum curvature change is around 10.7% 
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Figure 12.21 First 10 vertical modes for BCO obtained with impact testing. 
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Figure 12.22 Deflection profile obtained using the modal flexibility for BCO. 


and is obtained at the damage location. The changes in the curvature for other points are less than 
1%-3% except at N12, where 4.7% increase in the curvature is computed. 

Damage case 2 (DC2): Boundary restraint at N7 and N14: For DC2, the damage can be visually 
observed from the deflection patterns as seen in Figure 12.26. It should be noted that this damage 
case can be considered as a symmetric damage case (both N7 and N14 are restrained) and thus a 
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Figure 12.23 Curvature profile obtained using the modal curvature for BCO. 
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Figure 12.24 Deflection comparison for BCO and DC1. 


clear change in the deflected shapes is observable for both of the girders. The deflection reduced by 
about 30%—50% at the span where joint restraint damage scenario was implemented. 

Curvature from the deflected shapes was determined subsequently for DC2. As was mentioned 
before, the curvature at the roller supports was assumed zero for pin-roller boundary conditions. 
For the restrained case, however, this assumption is not correct for N7 and N14 since the moment 
at these fixed supports is nonzero. However, for visualization purposes, the curvatures at N7 and 
N14 are still assumed as zero. It is seen from Figure 12.27 that a clearly observable 30% decrease 
in the curvature exists near the damage location. Here, we see a decrease in the curvature because 
the structure with restrained support is stiffer than the baseline with roller supports. It is clear that 
a finer resolution of sensors, especially around the end supports, would yield more accurate results 
in terms of curvature. 
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Figure 12.25 Curvature comparison for BCO and DC1. 
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Figure 12.26 Deflection comparison for BCO and DC2. 
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Figure 12.27 Curvature comparison for BCO and DC2. 
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12.3.3 Damage Detection Results Using Nonparametric Methods 
12.3.3.1 AR in Conjunction with Novelty Detection 


For this section, ambient vibration data from the test grid created by random hand tapping of two 
researchers simultaneously are used. Sampling rate is 400 Hz for the experiments. There are 23 data 
blocks for each case. The acceleration data are averaged by using RD where the reference channel 
for RD process is node 2 (the location of node 2 can be seen in Figure 12.28). The model order for 
the AR models has been determined to be 10. The threshold is calculated as 180. 

Baseline case (BCO): Before the analysis of the data from the damage cases, it is investigated 
whether the data from the baseline (healthy) grid structure are under the determined threshold 
value or not. Figure 12.28 shows analysis results of the baseline data acquired on the same day (first 
and second half of one data set). It is seen from the figure that all the values are under the threshold 
value (no false positives). This indicates that the numerically evaluated threshold value is consistent 
with the experimental results. 

Damage case 1 (DCI): Moment release and plate removal at N3: Figure 12.29 shows the same 
plots for DCI. For this case, it is observed that the features for the second data set are clearly 
separated from the features from the baseline case. This shows that the damage applied at N3 can 
be identified by using the methodology. However, there is no clear information about the location 
of the damage since approximately same amount of separation is obtained at all nodes. 
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Figure 12.28 Verification of the threshold value with experimental data. 
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Figure 12.29 Mahalanobis distance plots for DC1. 


Damage case 2 (DC2): Boundary restraint at N7 and N14: As for DC2 (Figure 12.30), the 
majority of the values are still above the threshold value; however, some false negatives are also 
observed. These results are somewhat surprising since the severe damage at the boundaries should 
be detected with a smaller number of false negatives. 

The results presented in the preceding sections show that the methodology is capable of detecting 
changes in the test structures for most of the cases. However, the methodology does not provide 
enough information to locate the damage. It should also be noted that there are a number of 
issues to be solved before the methodology can be successfully applied to real-life structures in an 
automated SHM system. 

For example, it was noted that determining the right threshold value is one of the important 
issues to solve. The threshold value depends on both the length of the feature vector and the 
number of the features in the vector. Therefore, a different threshold value might be obtained when 
a different model order is used since the AR model order determines the length of the feature vector. 
It was also noted that determining this threshold is not a trivial problem, and it might require 
some trial and error process during the monitoring process. If the threshold is set too low, most 
of the healthy data can be identified as outliers, increasing the number of the false positives. On 
the contrary, if the threshold value is set too high, data from damaged structure can be classified as 
inliers, which is not a desired situation, either. Further investigation is needed for demonstration of 
the threshold value for an automated SHM application. Second, determining the order of the AR 
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Figure 12.30 Mahalanobis distance plots for DC2. 


model in an automated SHM system might be difficult. A high model number might be required 
for complex structures (and therefore a longer feature vector), and this can make it more difficult 
to identify the outliers because of “the curse of the dimensionality.” For example, it is shown that 
the model order p for the grid structure is 10, and this number might be quite high for a real-life 
structure such as a long span bridge. 


12.3.3.2 ARX with Sensor Clustering 


For this part, the free responses that are obtained from the impact tests are used. The first 100 
points (out of 4096 data points), which cover the duration of the impulse, of each data set are 
removed so that the impact data can be treated as free responses. Five impact data sets were used 
for each case. 

Damage case 1 (DCI): Moment release and plate removal at N3: For DCI, Figure 12.31 shows 
that the DFs for N3 are considerably higher than the threshold and other nodes. This fact is due 
to the plate removal at this node. It is also observed from the figure that the DFs for N2 are also 
relatively high since N2 is the closest neighbor of N3. Finally, the secondary effect of the damage on 
N5 and N10 is also seen. Therefore, the methodology was very successful at detecting and locating 
the damage for this experimental damage case. Finally note that the DFs for other nodes are around 
the threshold showing us that these nodes are not affected significantly from the localized damage. 
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Figure 12.31 DFs for DC1. 
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Figure 12.32 DFs for DC2. 


Damage case 2 (DC2): Boundary restraint at N7 and N14: The results for the second damage 
case, DC2, are presented in Figure 12.32. It is seen that the DFs for N6 and N13 are considerably 
higher than the other nodes. This is because of the fact that they are the closest nodes to the 
restrained supports (N7 and N14). The DFs for N5 and N12 are also high since they are also 
affected by the damage. The DFs for the remaining nodes are also slightly higher than the threshold 
since the structure is changed globally for DC2. 
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12.4 Summary and Concluding Remarks 


The main objective of this chapter is to provide a brief review of different signal processing 
techniques for damage detection in the context of SHM with applications to CIS. Presented 
discussions can mainly be summarized in three parts: (1) the evaluation of parametric methods and 
damage features, (2) investigation of nonparametric methods and features for damage detection, 
and (3) demonstration of the effectiveness of the methodologies with laboratory experiments. 

For the parametric damage evaluation approaches, a general outline for different data analysis and 
feature extraction methods is discussed. Examples of modal parameter-based parametric damage 
features such as modal flexibility and modal curvature are presented, and formulations for extracting 
these parameters from vibration data are presented. Afterward, statistical pattern recognition 
approaches for nonparametric damage detection are presented. Time series analysis along with its 
implementation with outlier detection for damage detection is discussed. These methodologies 
can be considered as a complement to commonly employed damage detection methods. A sensor 
cluster-based time series modeling is also discussed as a powerful damage detection methodology. 

After discussing the theoretical background, the performances of these damage detection meth- 
ods and damage features are exemplified by using experimental data from a steel grid for different 
damage scenarios. The experimental studies show that these methodologies perform successfully 
for most of the cases. However, it is also noted that the success rates of the techniques may differ 
for different cases. One important point here is that a particular data analysis method may not be 
able to answer every SHM problem. Therefore, a combination of different approaches should be 
adapted for a successful and reliable SHM system. 

Based on the authors” experiences, one very critical challenge in SHM research is the effect of 
the environmental and operational effects on the structure, which also has critical effects on the 
data. Damage detection process may easily get very complicated if there is a considerable change in 
the operational and environmental conditions during the data collection process. Therefore, robust 
methodologies for elimination of these external effects should be developed and combined with 
damage detection methodologies. 

It is also seen from the results that for an automated SHM system, it might be necessary to 
set certain rules about the number of outliers so that a decision can be made whether damage 
has occurred in the structure or not. For example, if a certain number of the new data points are 
determined as outliers as opposed to a single outlier, then it might indicate a possible structural 
change where further precautions may be necessary. After determining that a structural change has 
occurred, more rigorous analyses can be conducted by using different methodologies to determine 
the location, severity, and the nature of the change. 

Finally, it is emphasized that laboratory studies with large, complex, and redundant test speci- 
mens are a critical link between the theoretical work and field applications. The reliability of the 
methods should be verified for damage detection for different cases, loadings, and structures. After 
making sure that it can be used for a variety of (laboratory and real life) structures under different 
loading and environmental conditions, the methodology can be implemented to different sensor 
networks. Embedding these algorithms to different sensor networks for an automated data analysis 
process will facilitate a better management of CIS in terms of safe and cost-effective operations. 
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13.1 Introduction 


A wireless sensor network (WSN) [1] is a special kind of a peer-to-peer (P2P) network where the 
nodes communicate with the sink wirelessly to transmit the sensed information. In contemporary 
world, a WSN utilizes different technological advancements in low power communications and 
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very large scale integration (VLSD [2] to support functionalities of sensing, processing, and com- 
munications. WSNs have penetrated in all walks of life ranging from health care to environmental 
monitoring to defense-related applications. 

A WSNs is often followed by an application that uses sensed observations to perform a certain 
task. For example, a WSN may be designed to gather and process data from the environment to 
regulate the temperature level of a switching station. A fundamental issue in WSN is the way the 
data are collected, processed, and used. Specifically, in case of data processing, the initial step is 
to verify whether the data are correct or conform to a specified set of rules. In this context, data 
cleaning for WSN arises as a discipline that is concerned with the identification and rectification 
of errors given the nodes inability to handle complex computations and low energy resources. In a 
nutshell, data cleaning in WSN can be defined as the process to detect and correct the inaccurate 
sensor data by replacing, modifying, or deleting the erroneous data. 

The challenging aspect of data cleaning in WSN is that the cleaning mechanism is online; that 
is, the cleaning process is carried out for continuous stream of data generated [3]. After the cleaning 
process is executed, ideally the data stream becomes consistent, free from any errors, suitable for 
the applications to use, and make any decisions based on it. However, in spite of technological 
advances in sensor processing, no perfect data cleaning mechanism exists for WSN so far. 

WSNs are generally deployed in environments where nodes are exposed to nonideal conditions 
as a result of which the nodes’ ability to accurately record or relay the information is hampered [4,5]. 
For example, a WSN may be asked to report the temperature or other environmental attributes of 
a nuclear reactor where nuclear/electromagnetic radiations pose problems for the nodes to precisely 
record the observations and relay them. Therefore, before deploying a WSN the possibility of 
imprecision (or even loss depending on the surroundings) of sensor measurements should be 
considered. Sometimes, even in case of ideal conditions, sensors may not perfectly observe the 
environment they monitor. This is due to the hardware inaccuracies, imprecision, and imperfection 
of the sensing mechanism imparted when the device is manufactured. More specifically in this case, 
the imperfections represent the inability of technology to perfectly manufacture a sensing device. 

WSNs also tend to malfunction because of natural/man-made conditions they operate in. For 
example, a sensing node monitoring the water level of a levee might get washed away due to floods 
or a node monitoring wind speed might be blown away due to heavy hurricane winds. Sensor 
nodes under severe conditions have the tendency to ill perform, and the breakdown of a single node 
might compromise the overall perception and/or the communication capability of the network. 
The perception ability of a sensing node corresponds to the extent of exposure [6]. 

WSN also faces limitations due to its position and time of operation of its constituent nodes. 
The sensing range of a node might not cover the entire region as required by the application. For 
example, a temperature monitoring node might partially cover the area it is required to observe. 
Also, a node cannot be active all the time because of its limited energy constraints. As a result, 
the sensing operation of the node is activated and deactivated based on a sampling rate defined to 
make sure that no relevant event is missed. For example, a node monitoring the moving activity 
of an object might not be active when the object passes through its sensing region. Spatial [7] and 
temporal [8] coverage in WSNs has been explored in different scenarios, from target tracking [9] 
to node scheduling [10,11] to cover the aforementioned drawback. 

Through this chapter, a paradigm for validating extent of spatiotemporal associations among 
data sources to enhance data cleaning is discussed by means of pairwise similarity verification 
techniques. The primary work described in this chapter establishes belief [12] on potential sensor 
nodes of interest to clean data by combining the time of arrival of data at the sink and extent of 
spatiotemporal associations among the sensor nodes. 
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13.2 Background 


Data cleaning in WSN essentially suppresses the drawbacks and shortcomings mentioned in the 
Section 13.1. One of the major challenges of data cleaning in WSN is that the cleaning process 
has to be performed online on the incoming stream of data, depending on the requirements of the 
application query. The fundamental features of data cleaning in WSN involve outlier detection, 
adaptation, and estimation. 

An outlier is an error observation whose value lies at an abnormal distance from other samples 
in an expected data set. Subramaniam et al. [13] in their work classified outliers as distance based 
or local metric based. Distance-based outliers do not require any prior knowledge of the underlying 
data distribution, but rather use the intuitive explanation that an outlier is an observation that is 
sufficiently far from most other observations in the data set. Local metric—based outlier is observed 
when the comparison between the samples with the neighborhood count differs significantly. 
Local metric—based outliers utilize the spatial associations between the nodes. Jeffery et al. [14] 
classified outliers for Radio-Frequency Identification (RFID) systems as false positives—a reported 
observation that indicates an existence of a tag when it does not exist—and false negatives—an 
absence of reading where a tag exists, but is not reported. The aim of any data cleaning mechanism 
is to suppress false positives and regenerate false negatives 

Many data cleaning approaches have been presented in literature to cleanse the corrupted data 
from WSN to deal with outliers and replenish the data stream with an appropriate value. 

The first independent data cleaning mechanism, Extensible Sensor stream Processing (ESP) for 
WSN was proposed by Jeffery et al. [15,16] in collaboration with a research group responsible 
for developing Stanford data stream management system (STREAM) [17-19]. ESP essentially is a 
framework for building sensor data cleaning infrastructures for use in pervasive applications. ESP’s 
pipeline approach (as shown in Figure 13.1) has been designed in such a way that it can easily be 
integrated into STREAM. ESP accommodates declarative queries from STREAM, which performs 
cleaning based on spatial and temporal characteristics ofsensor data. Incoming data stream is passed 
sequentially to different stages of ESP to remove the unreliable data and generate the lost data. 

ESP systemizes the cleaning of sensor stream into a cascade of five declarative programmable 
stages: point, smooth, merge, arbitrate, and virtualize. Each stage operates on a different aspect 
of data ranging from finest (single readings) to coarsest (readings from multiple sensors and other 
data sources). The primary objective of the point stage is to filter out the individual values (e.g., 
distance-based outliers). Smooth performs the functionality of aggregation to output a processed 
reading, and then advances the streaming window by one input reading. Merge stage uses spatial 
commonalities in the input stream from a single type of device and clusters the readings in a single 
group. The arbitrate stage filters out any conflicts such as duplicate readings, between data streams 
from different spatial clusters. The virtualize stage combines readings from different types of devices 
and different spatial clusters. 

Sarma [20] along with the developers of ESP proposed a quality estimation mechanism while 
object-detection data streams are cleaned to overcome the drawback of the sequence of different 
stages in ESP. The work essentially develops a quality check mechanism in parallel to the pipelined 
data cleaning processes. Every step of the cleaning mechanism is associated with a quality testing 
mechanism defined by means of two parameters, confidence and coverage. 

Confidence is defined as the measure of evidence of an object being present in a logical area; 
the evidence typically is the data about the phenomena being sensed provided by different sensors. 
Coverage is a window-level value assigned to a set of readings for a given time period T. It gives 
the fraction of readings from a stream representing the actual environment in the time period 7. 
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Figure 13.1 Fundamental pipeline stream architecture for data cleaning in wireless sensor 
networks. 


The confidence and coverage parameters are calculated for every instance of the incoming stream 
based on which the current quality of the cleaning mechanism is determined. 

The possibility of incorporating the sampling rate and the variable window size of the data stream 
into data cleaning process motivated the developers of ESP to propose SMURF [14]. SMURF is 
an adaptive smoothing filter developed to provide a preamble to ESP’s architecture. It models 
the unreliability of RFID readings by viewing RFID streams as a statistical sample of tags in the 
physical world, and exploits techniques based on sampling theory to drive its cleaning processes. 
Through the use of tools such as binomial sampling and estimators, SMURE continuously adapts 
the smoothing window size in a principled manner to provide accurate RFID data to applications. 

SMURFP’s adaptive algorithm models the RFID data stream [21] as a random sample of the tags 
in a reader’s detection range. It contains two primary cleaning mechanisms aimed at producing 
accurate data streams for individual tag-ID readings (individual tag cleaning) and providing accurate 
aggregate estimates over large clusters of multi-tags using Horvitz-Thompson estimators [22]. It 
focuses more on capturing the data sensed by the network rather than filtering out the anomalies 
(noise and data losses) in the data. The SUMRF filter adapts and varies the size of the window 
based on the evaluation of binomial distribution of the observation observed. SMURF is a tool 
used in ESP to achieve improved efficiency and accuracy while performing data cleaning. 

SMURF and ESP have been extended to develop a metaphysical independence layer [23] 
between the application and the data gathered. The key philosophy behind metaphysical data 
independence (MDI) is that sensor data are abstracted as data about the physical world; that is, 
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applications interact with a reconstruction of the physical world in the digital world, as if the 
physical-digital divide does not exist. Essentially, MDI-SMURE is an RFID middleware platform 
organized as a pipeline of processing stages with an associated uncertainty-tracking shadow pipeline. 

MDI-SMURE uses temporal smoothing filters that use statistical framework to correct for 
dropped readings common in RFID data streams. Additionally, the filter estimates the resulting 
uncertainty of the cleaned readings. These cleaned readings are then streamed into spatial-SMURE, 
a module that extends temporal-SMURF's statistical framework to address errors and semantic 
issues that arise from multiple RFID readers deployed in close proximity. 

Zhuang et al. [24] in contrast to ESP presented a smart weighted moving average (WMA) 
algorithm [25] that collects confident data from sensors according to the weights associated with 
the nodes. The rationale behind the WMA algorithm is to draw more samples from a particular 
node that is of great importance to the moving average, and provide a higher confidence weight 
for this sample, such that this important sample is quickly reflected in the moving average. In 
order to accomplish the task of extracting the confident data, the sampling rate of the sensors 
is increased by means of a weighting mechanism dependent on the identification of significant 
change in the incoming data stream. A change is said to be significant whenever there is enough 
confidence to prove that the new value exceeds the prediction range. The identified sample, as well 
as its confidence level, is sent to the sink, and if the sample is finally proved to be in the prediction 
range, only the proved sample (without attaching the confidence) is sent to the sink. 

Elnahrawy and Nath [26-28] along the lines of WMA use Bayesian classifiers [29] to map the 
problem of learning and utilizing contextual information provided by wireless sensor networks 
over the problem of learning the parameters of a Bayes classifier. The adapted Bayes classifier is 
later on used to infer and predict future data expected from WSN. The work proposes a scalable 
and energy-efficient procedure for online learning of these parameters in network, in a distributed 
fashion by using learning Bayesian networks. Elnahrawy and Nath use the current readings of 
the immediate neighbors (spatial) and the last reading of the sensor (temporal) to incorporate the 
spatial and temporal associations in the data. The authors rightly use Markov models with short 
dependencies in order to properly model the nonlinear observations provided by the network. 

The model initially was not proposed to perform data cleaning. However, a new version of the 
model was presented by the same authors [28] to identify the uncertainties associated with the data 
that arise due to random noise, in an online fashion. The approach combines prior knowledge of 
the true sensor reading, the noise characteristics of sensor, and the observed noise reading in order 
to obtain a more accurate estimate of the reading. This cleaning step is performed either at the 
sensor level or at the base station. 

Peng et al. [30] combine the concepts of graph theory and business process logic to develop 
a data cleaning model for RFID systems. The model presented collaboratively sends and receives 
messages between related nodes, and is capable of detecting and removing false positives and false 
negatives to clean the data. The work envisages WSN to be a graph and nodes as the vertices of 
the graph. A small relevant number of vertexes in the network are involved to form a data cleaning 
cluster. Nodes in the cluster are related by the business processing logic. The P2P-collaborated 
data cleaning process is divided into three phases: initialization phase, local correction phase, and 
peer correction phase. In the initialization phase, when one of the tagged items is detected, the 
information of the detected node will be sent to the previous and next nodes that fall within the path 
of traversal for the data from the node to the sink. In the local correction phase, the false negatives 
detected are identified and corrected locally, whereas in the peer correction phase the cleaning is 
performed using the previous, current, and the next patterns as well as the messages received from 
other adjacent nodes. 
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In general, a data cleaning mechanism for WSN attempts to solve the following problem: 


Problem definition: Given a set of spatially and temporally related nodes continuously monitoring 
some distributed attributes. If x,,(£) is the data received by the sink through node 7; at time instant 
t then minimize the difference between the actual data representing the environment and the data 
received at the sink, that is, 


minimize Xy,(t) — Xn, (t) (13.1) 


where X,,(¢) is the actual data representing the environment. 


13.3 Data Cleaning Using TOAD 


In Section 13.2, all the data cleaning mechanisms for WSN proposed so far utilize the spatiotemporal 
associations among the nodes in some way or the other. However, majority of them fail to 
verify or validate the extent of spatiotemporal associations among them before they are exploited. 
This deficiency is critical considering factors such as uncertainty in the communicating medium, 
improbable behavior of the sensing environment, and the low energy resources of the nodes, which 
are responsible for forcing the data to lose its spatiotemporal associations with other nodes and 
the past data. Banking on this drawback, Ali et al. [12] presented a data cleaning design which 
makes sure the validation of spatial and temporal data smoothing techniques in fact facilitates 
data cleaning. 

The main feature of the design is the usage of time of arrival tg of data. Intuitively, it can be said 
that if there are delays in data arrival, the conditions are nonconducive for communication and the 
likelihood of error in the data is high. 

Time of arrival identifies the current state of the network and the ability of a node to commu- 
nicate clean data whereas correlation validation measures the degree of association/disassociation 
among the nodes, so that appropriate smoothing mechanism is identified and used for cleaning 
the data. 

The data cleaning mechanism referred to as TOAD (Time Of Arrival for Data cleaning)[12] 
embeds ¢ into the spatiotemporal characteristics of data and provides a belief-based mechanism to 
filter out any anomalies in the data. 

What demarcates TOAD from its peers is the presence of a belief mechanism that not only 
identifies nodes with highest confidence in providing clean data to the sink, but also selects an apt 
smoothing mechanism based on the confidence exhibited by the incoming stream of data. 


13.4 TOAD Architecture 


The architecture of TOAD as shown in Figure 13.2 comprises belief component, data smoothing 
component, rules component, and arbitration component classified according to their function- 
alities. The rules component is responsible for creating rules and settings that affect the precision 
of the output of the cleaning mechanism and the methodology adopted for smoothing the data. 
The data smoothing component has three filters which are responsible to manage the false positives 
and false negatives [14] that arise in the data stream. Belief component is responsible for selecting 
the method of cleaning and smoothing the data at a particular instance, spatially, temporally, and 
spatiotemporally from average (A), temporal (T), and tap-exchange (TE) smoothing filters, respec- 
tively. TOAD contains a feedback loop from the three smoothing filters to the belief component 
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Figure 13.2 TOAD architecture. 


in order to compare the predicted/estimated output with the received sample (if available), identify 
the dirty and lost data, compensate for it and provide a support mechanism in case of data losses. 
Although there exists a vast literature on spatial and temporal smoothing procedures [16,26,31], to 
elucidate the ability of TOAD architecture an adaptive filter approach [32,33] similar to the work 
presented in [34-36] is adopted. 


13.5 Preamble 


WSN generates a stream of data, wand the cleaning mechanism does not have the liberty of 
processing the entire data set at a given instance. Techniques based on windowing where a fixed 
amount of data samples are temporarily stored, processed, and purged after certain time limit are 
used to process data streams. The size of the window is critical for any data processing, system as 
there are instances where more samples might be required to achieve better results. 

Primarily, there are two techniques for defining the window size: 


1. Time based 
2. Count based 


The time-based window system temporarily stores data for the past few time instances. For 
example, a window can store all the data samples that reach the sink for the past 5 min. The 
drawback in this approach is that the space to store the number of samples is indefinite as the 
availability of them from the past ¢ time instants is not fixed. This poses a major challenge while 
data processing, as the size of the window dynamically varies. The count-based approach, on the 
other hand, uses a fixed number of samples in the window at any given instance of time and updates 
the window for every newly arrived sample. The utilization of time-based and count-based window 
systems vary from application to application and the sinks storage and processing ability. 


312 m Intelligent Sensor Networks 


13.6 TOAD Components 


TOAD uses the count-based window system and sets the window size equal to that of the number 
of taps of the adaptive filter used for smoothing. Upon the arrival of a new sample from a node, 
the window slides by deleting the oldest sample in it. Typically, the N-sized window contains the 
current value and the past N — 1 samples of the data stream, and every window is responsible for 
generating the (V + 1)th sample. 


13.6.1 Belief Component 


The major difference between TOAD and the other data cleaning systems [14,26] is that of 
verifying the strength of correlation and the measure of the node’s ability to send the clean data 
uniformly. Belief component of TOAD is responsible for incorporating a mechanism that calculates 
the correlation among the nodes along with their ability to send the data uniformly. The major 
functionalities of the belief component involve confidence evaluation of a node and error detection 
(and correction), which are achieved by its respective modules. 


13.6.1.1 Confidence Module 


The primary objective of the confidence module is to evaluate the degree of belief on each node that 
contributes for data cleaning, through a variable called belief parameter. Let Razr, be the correlation 
coefficient [25] obtained between the previous NV samples of the nodes 7; and n; (N = size of the 
window and number of taps used in the filter). Let 


Xn; (k) = Xn; (k) xn(R-1) ... wn (k-N+1) (13.2) 
be the previous VV samples at node n; and 

Xn (k) = Xnj(k) xn RI)... (kN +1) (13.3) 
be the previous VV samples at node 7, then the value of correlation coefficient is given by 


cov(x,; Xy) 
R(n; n) = — 
01,01, 


_ Enp Xn) — En) En) (13.4) 
JER) — En) JE) — FC) 


where cov(x,;, Xy) = covariance between nodes x,;, Xn 


Elan) = 22O 
N 
T 
En Xn) = a ise istl (13.5) 


0, (standard deviation) = ,/ E(x?) — E? (xn,;) 
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The value of the correlation coefficient has two properties that represent the degree of similarity 
between the nodes[25]: 


1. Magnitude: The magnitude of the correlation coefficient determines the extent of a relation- 
ship between the two entities that are being compared. The maximum value of magnitude 
is 1 (highly correlated) and the minimum is 0 (uncorrelated). 

2. Sign: The direction of relationship is determined by the sign of the correlation coefficient. If 
the correlation coefficient is positive then the samples in the windows are related in a linearly 
increasing order, whereas if the correlation coefficient is negative then the samples in the 
windows are related in a linearly decreasing order. 


From Equations 13.4 and 13.5, it is imperative that the value of the correlation coefficient 
R(%;, n;) tends to a nondeterministic form if the standard deviation in any of the windows containing 
the samples from the nodes is approximately equal to 0. Low standard deviation implies low variance, 
which means that the value of samples in the window hardly changes (possess a constant value). If 
the standard deviation of any window is approximately equal to 0, then TOAD assigns 0 to the 
correlation coefficient and subsequently 0 to the belief parameter. 

In order to accommodate inconsistencies in the time of arrival of data from the nodes, TOAD 
uses a time consistency variable p, which is defined as recurring sum of the ratio of difference 
between the waiting interval and the sampling interval of the data. The waiting interval here is the 
time difference of arrival of two consecutive samples. The value of p is maintained for every node 
and is updated whenever a new sample arrives. Let 5 be the sampling interval, that is, the time 
duration after which each sample is sensed and expected to reach the sink, and tg be time of arrival 
of the kth sample at any node in a real-time scenario; the time consistency variable of a node can 


then be defined as 


abs(|tp — tea — 0) 
ô 
abs(|tp_-1 — te-2| — Ó) 
ô 


Pr; (k) = Pr; (k 1) 


(13.6) 


abs(|ti — tol — 5) 


a (1) = 14 
Pn; (1) 5 


where t¿ — t}—1 is the waiting time spent by the module for the sample to arrive at the Ath instance. 
Pn, also depicts the ability of a node to consistently relay the data at a fixed rate because of its 
recursive nature. Also, p provides deviation of actual arrival rate of data from the expected arrival 
rate which at times can also be used to assess the communication inaccuracies. Larger the value of 
Pn; less the reliability of the data received by the node and vice versa. Whenever a new sample 
arrives, the time consistency variable is reset to 1 to inform the system that the sample containing 
the actual information about the environment has arrived from a specific node. 

The belief parameter is defined as the ratio of the correlation coefficient and the time consistency 
variable p,,. Based on the value of the belief parameter, one can identify the node that is correlated 
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and consistent to provide clean data to the sink: 


R(n; n;) 
Barny = ——— (13.7) 
Pn, 


The belief component uses sample windows and time consistency variables of nodes sensing the 
same attribute to calculate the degree of belief. 


13.6.1.2 Error Detection and Correction Module 


The other major functionality of the belief module is to detect and rectify the corrupted data 
(outliers) that are present in the data stream. Although there exist many formal definitions of 
outliers, TOAD uses distance-based outliers and local metric—based outliers as described in [13]. 

If a sample of a data stream lies outside the feasible zone as specified by the application, then 
it is labeled as a distance-based outlier. The distance-based outliers can be handled well if the end 
application/user has prior information about the accurate threshold range that the samples of the 
data stream lie in. For example, the temperature of a steel furnace cannot reach 0 degrees during 
its period of operation. 

A sample of the data stream is termed to be a local metric—based outlier if its value differs 
from the mean value of samples of similar nodes, significantly. The similarity between two nodes 
in TOAD is calculated by using the correlation coefficient stated in Equation 13.4. Typically, to 
rectify these outliers spatial relationships are used. 

To identify local metric-based outliers, the belief module calculates the mean of the predicted 
values of similar nodes from the spatial module and compares with that of the incoming sample. 
If the difference between them is significant, then the sample is termed to be a local metric-based 
outlier. These kinds of outliers can be detected if and only if there is more than one highly correlated 
node. The threshold for the difference can be regulated by the rules component of the system. 


13.6.2 Smoothing Component 


Depending on the value of the belief parameter, the data stream smoothing process is selected from 
different smoothing mechanisms (temporal, average, and tap exchange). 

Table 13.1 provides the range of the belief parameter and the corresponding smoothing module 
employed for data cleaning. The range specified in Table 13.1 is calculated on the basis of existing 
literature on correlation coefficient specified in [25]. 


Table 13.1 Module Identification 


Belief Range Module Selected 


0.8 < |p| <1 Spatial 


0.5 < |B| < 0.8 | Spatiotemporal 


IB] < 0.5 Temporal 
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13.6.2.1 Temporal Filter 


To include the facility of data cleaning temporally, TOAD uses adaptive filters [32]. Each filter 
contains a window to accommodate NV samples of the data stream and each sample is associated 
with a weight w,,(%) that is updated whenever the window slides. The sliding window is count 
based that is, whenever a new sample arrives, the oldest sample in the window is deleted and the 
position of existing samples is shifted by one place. The adaptive filter as shown in Figure 13.3 uses 
the current sample x,, (%) and the previous N — 1 samples to generate the output yp, (k) that is then 
compared to x,,(% + 1) to find the error e(&). The error is then used by the adaptive algorithm to 
familiarize with the incoming data stream. In case of any data losses, the error is kept constant and 
the output y»,(%)) is fed back as the input. Let the filter deployed contain N taps: 


Wn) =w)(k) wilh) . . . wh (k,i=1,....p (13.8) 


where 
Wp, (k) is the set of window taps at the kp input sample 
n; is the node identification variable 


Let the desired data be denoted by xn,(k + 1) which is the (k + 1)th instance of the sensed data 
during adaptation. The error is given by e(£) = x,,(2+ 1) — yn; (k), where yn; (k) = Wn; (A)x), (R) is 
the output of the filter. Table 13.2 provides details of some adaptive algorithms* that are typically 
deployed in machine learning systems to capture temporal relationships among the data. At the 
physical layer of any communication system, these algorithms are used to minimize interference 
between the signals [37]. The stability, robustness, and performance of these algorithms have 
already been proven [32,38] and demonstrated. The algorithms mentioned in Table 13.2 can 
also be used in TOAD to exploit the temporal relationships between the sensed data by updating 
window taps w(%) after the arrival of each sample from the nodes. 

Each filter temporally evolves by generating weight vectors according to the equations specified 
in Table 13.2. The weights essentially capture the pattern followed by the past data and vary their 


xy i(k+1) 
Xyi(k) ni(k) 
a E A 
- |+ 
w,,¡(K) e(k) 
Adaptive 
algorithm 


Figure 13.3 Adaptive filter. (From Ali, B.Q. et al., Wireless Commun. Mobile Comput., 12(5), 
406, 2012. With permission.) 


* The reader is requested to refer the work by Simon Haykin [32,33] for further details about the algorithms 
mentioned in Table 13.2 
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Table 13.2 Adaptive Algorithms 


Algorithm Tap Update Function 


Least mean square Wn, (kK + 1) = wn;¡(k) + ue(k)xp;(k) 
elk) = [xnj(k +1) — ynj(k)] 


Normalized least mean square | wn;(k + 1) = wp,(k) + En 7 e(K)xn;(K) 


Xnj 

elk) = [xnj(k + 1) — ynj(k)] 
ETIPIk — 1]x(k) 

1+ EXA (oPIk — 1]x(k) 

Recursive least square P[k] = ¿1P[k-1]+ p71 K(k)x(k) 


K(k) = 


wnj(k + 1) = wnj(k) + K()e(k) 
e(k) = [xnj(k + 1) — yn; (k)] 


values according to characteristics of the data stream. If the weights vary minimally then the filter 
is said to have converged. It is at this instance that the error between the estimated and the desired 
output is approximately equal to 0. 

By using adaptive filters, the window of samples used to calculate the belief parameter can also 
be used by filters to forecast the future sample. 

The temporal module is used when the belief parameter is less than 0.5. If a false positive is 
detected then the corrupted sample at that instance is replaced by the output of the filter. During 
false negatives, the filter performs prediction by replacing the null values by the output of the filter. 
The error e(&) is kept constant as the sample x,,(% + 1) is unavailable. Also, during prediction 
Yn;(k) is fed back to the filter as the input data are unavailable. However, the algorithms alone are 
not in a position to exploit the spatial relationships among the data independently. 


13.6.2.2 Average Filter 


When the belief parameter is greater than 0.8, averaging is used for cleaning the data in TOAD. 
Averaging aggregates data in the space dimension utilizing readings from highly correlated sensors 
monitoring the same logical area and environment. The false positives and false negatives are 
replaced by the sample from the node which has the highest belief parameter. The correlation 
coefficient gives the strength and direction of linear relationships between two different nodes. The 
higher the value of the belief parameter, the higher the correlation between the nodes. 

If a false negative or a positive is needed to be replaced by the value generated by the average 
filter, then the mean of the difference between the windows of samples is evaluated. The mean is 
then added to the sample of the corresponding node to get the required sample. The high-level 
description of the methodology is specified in Algorithm 13.1. 


Algorithm 13.1 Spatial module process of smoothing the data stream 
: Calculate Brin; Vji=1.p 

: Sort Brin; Vji=1.p 

: Calculate p = mean (xy; (k) — Xnj (£)) where k = p...(p + N) 

Xp (k +1) = xp (k +1) +0 


A DyN ui 
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13.6.2.3 Tap-Exchange 


Not all burst of samples in the data stream are related spatially and temporally. There are situations 
when the spatial and temporal relationships among the data have to be used simultaneously for 
data estimation. The tap-exchange module of TOAD provides such a functionality when the 
belief parameter lies in between 0.5 and 0.8 (from Table 13.1). TOAD exploits the temporal 
associations by linear adaptive filters, and introduces the spatial association in the design by 
exchanging the weights associated with each sample within the window of the temporal filter as 
shown in Figure 13.4. Essentially, the weights represent changes in the environment that is being 
sensed. The high-level description of the algorithm is stated in Algorithm 13.2. 


Adaptive filter 
for node 1 (n1) 


Belief 
parameter 


Filters to exploit 
temporal relationship 


Wile 


Adaptive filter 
for node 2 (12) 


[HH] 


(E) Wt) W(t) 


Adaptive filter 
for node 3 (n3) 


> — Belief 
Arrival rate phrameter 
x ( t) 
— | 


Wi) WA W(t 


Figure 13.4 Tap-exchange smoothing process. 


Algorithm 13.2 Spatiotemporal module 


1: Identify n; with highest Brin; Vi = Lip 
2: Replace w” (k) with w” (k) 
3: Estimated output: yn; (k) = wn; (k)x” (k — 1 — D) 
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Figure 13.5 Predominant data loss in the sensed data. (a) Node 11 data loss, (b) node 12 data 
loss, and (c) node 13 data loss. (From Ali, B.Q. et al., Wireless Commun. Mobile Comput., 12(5), 
406, 2012. With permission.) 


13.6.3 Arbitration 


Arbitration component avoids the possible conflicts between the readings from different sens- 
ing nodes that are physically close to one another and sensing the same attribute. Any possible 
aggregation functionality in the request by the application is taken care by this component. Also, 
arbitration component filters out contradictions, such as duplicate readings, between data streams 
from different spatially related nodes. 


13.7 Case Study 


In this section, data cleaning mechanism TOAD is tested by means of simulation on real-time data 
sets provided by Intel Labs at MIT [39]. 

Nodes 11, 12, and 13 that measure temperature of the same room (Figure 13.6) for a duration 
of 3 h and 13 min from 1:12 AM to 4:25 AM on February 28, 2004, are taken into consideration. 
From Figure 13.7, it is evident that data from the nodes 11, 12, and 13 although spatially correlated 
do not change linearly with each other and any assumption made to utilize this spatial relation 
results in estimating an incorrect output. 
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Figure 13.6 MIT sensor test bed. (From Ali, B.Q. et al., Wireless Commun. Mobile Comput., 
12(5), 406, 2012. With permission.) 
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Figure 13.7 Real-time data from nodes 11, 12, and 13 from MIT Lab. (From Ali, B.Q. et al., 
Wireless Commun. Mobile Comput., 12(5), 406, 2012. With permission.) 


Table 13.3 illustrates the calculation process of the belief parameter and selection of the smooth- 
ing, process used to clean the data. The rules component (shown in Figure 13.2) is used to set the 
value of the belief parameter based on which decisions regarding the selection of the smoothing 
process are made. 

The correlation coefficient between nodes 11 and 12, and 11 and 13 is represented by R; 112 
and R11,13 whereas p12 and p13 represent the time consistency variables (Equation 13.6). The belief 
parameter defined as the ratio of the correlation coefficient R and the time consistency variable 
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Table 13.3 Selection Process of the Cleaning Methodology 


Xm R11,12 P12 | R1113 | P13 Bara | B1113 | A | TE | T 
15.4706 | —0.3483 | 2.1000 | 0.7258 | 4.0000 | —0.1659 | 0.1814 af 
15.5082 | —0.2288 | 1.0000 | 0.6830 | 1.0000 | —0.2288 | 0.6830 J 
15.6222 0.2459 | 2.0000 | 0.9983 | 2.0000 0.1230 | 0.4991 Y 
15.4706 0.5706 | 1.0000 | 0.9970 | 3.0000 0.5706 | 0.3323 a 
15.3540 | —0.0421 | 2.0000 | 0.9182 | 4.0000 | —0.0210 | 0.2295 A 


15.7504 0.3120 | 1.0000 | 0.9489 | 1.0000 0.3120 | 0.9489 | / 


15.4804 0.2362 | 1.0333 | 0.9512 | 2.0000 0.2285 | 0.4756 J 


15.4706 0.1319 | 1.0000 | 0.9718 | 1.0000 0.1319 | 0.9718 | / 


15.4608 0.1611 | 1.0000 | 0.9534 | 2.0000 0.1611 | 0.4767 af 


15.5127 0.3412 | 2.0000 | 0.9854 | 1.0000 0.1706 | 0.9854 | ~ 


15.4097 0.7949 | 1.0000 | 0.8334 | 1.0000 0.7949 | 0.8334 | / 


Note: A, averaging; TE, tap exchange; T, temporal. 


p is calculated using Equation 13.7. According to measurements made at the Intel Lab [39], only 
42% of the estimated data reached the sink, although the environment was indoor and controlled, 
unreliable wireless communication and low energy resources resulted in huge loss of data. The 
aforementioned information can be corroborated by examining Figures 13.7 and 13.5. 

Figures 13.8 and 13.9 illustrate the behavior of correlation coefficient, variation in time con- 
sistency, and fluctuation of belief parameter observed when TOAD is used to clean the data from 
node 11 using nodes 12 and 13. 

Correlation coefficient as shown in Figures 13.8a and 13.9a varies between +1 with respect to 
the incoming data streams from nodes 11, 12, and 13. Equation 13.4 reaches a nondeterministic 
form when the variance in the participating windows is very low. In such cases, TOAD assigns 
the value of the correlation coefficient and subsequently of the belief parameter to be 0; that is, it 
selects the temporal smoothing mechanism for cleaning (from Table 13.1). Essentially low variance 
implies that there is a minimal change between successive values of the window implying that the 
samples are near constant within the window. Under these circumstances, temporal smoothing is 
the best-suited method for generating the corrupted data as the error between the previous sample 
and the present sample is very low. 

Figures 13.9b, 13.11b, and 13.12b display the value of time consistency variable p for nodes 
11, 12, and 13. The value of p increases when the time interval between the successive arrival 
of samples increases. The maximum value of p attained is for node 11 when there is a gap of 
32 samples suggesting that the communication is not reliable from it and the chances that the data 
are corrupted or lost are more. As soon as a new sample is arrived, the time consistency variable is 
reset to 1 (from Equation 13.6), imparting some confidence to the node in providing clean data. 
The plot of p for node 12 (Figure 13.12b) is more dense around value 1 making its data more 
reliable to be considered for spatial cleaning, whereas the data from the other nodes are scattered 
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Figure 13.8 Evolution of belief parameter from correlation coefficient and time consistency 
variable for node 11 with respect to node 12. (a) Node 11 correlation with node 12, (b) time 
consistency variable of node 12, and (c) belief parameter. (From Ali, B.Q. et al., Wireless 
Commun. Mobile Comput., 12(5), 406, 2012. With permission.) 


throughout, conveying the likelihood of unreliability to be more. Through time consistency 
variable one can infer that there is some problem during communication but not the cause of the 
problem. 

Figures 13.8c and 13.9c depict the behavior of belief parameters for node 11 with respect to 
nodes 12 and 13. The plots convey the confidence of nodes 12 and 13 upon 11 to be considered for 
different smoothing mechanisms used for cleaning. The value of the belief parameter lies between 
+1 similar to that of the correlation coefficient. 

Figure 13.10 displays that TOAD suppresses the false negatives in the data from node 11 by 
dynamically selecting either of the smoothing mechanisms discussed in Section 13.4. Data losses 
in Figure 13.10 can be inferred by locating the absence of less dense **” marker in the curve. 

The correlation coefficient, time consistency variable, and the belief parameters for node 13 with 
respect to nodes 11 and 12 are shown in Figures 13.11 and 13.12. From Figure 13.13, similar to 
that of node 11, it is evident that the false negatives are suppressed by TOAD using an appropriate 
smoothing mechanism. 

An interesting observation can be made by examining Figures 13.9a and 13.11a, depicting the 
behavior of correlation coefficient, which are similar due to the commutative property of covariance; 
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Figure 13.9 Evolution of belief parameter from correlation coefficient and time consistency 
variable for node 11 with respect to node 13. (a) Node 11 correlation with node 13, (b) time 


consistency variable of node 13, and (c) Belief parameter. 
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Figure 13.10 Clean data after smoothing for node 11. 
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Figure 13.11 Evolution of belief parameter from correlation coefficient and time consistency 
variable for node 13 with respect to node 11. (a) Node 13 correlation with node 11, (b) time 
consistency variable of node 11, and (c) belief parameter. 


however, the belief parameters of nodes 11 and 13 vary with respect to each other because of the 
different time consistency variables. 

The previous simulations illustrated the effectiveness of TOAD in handling false negatives (i.e., 
data losses). However, TOAD has the facility to ensure that the false positives or corrupted data 
(distance-based as well as the metric-based outliers [13]) are suppressed. A sample is termed to be 
a distance-based outlier if it falls outside the range that is specified by the rules component. It is not 
always possible to have the exact outlier range available, hence distance-based outlier identification 
works well if the application has the prior information about the approximate behavior of the input 
data stream. 

To identify and rectify the metric-based outliers, TOAD compares the estimated output of 
the spatial module with that of the received sample whenever the belief parameter is within 
the acceptable range. If the error after comparison exceeds the threshold specified by the rules 
component, then the incoming sample is termed as an outlier and the estimated data from the 
spatial module are used to reinstate the data stream. 

To further test TOAD incoming data stream of node 11 randomly corrupted, a range of 
10°—20° is specified to detect the distance-based outliers, and the belief parameter is set to 0.8—1 
in order to detect the metric-based outliers. Figure 13.14 depicts that the incoming data stream 
from node 11 is corrupted and the samples are scattered all over the plot, whereas TOAD cleans 
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Figure 13.12 Evolution of belief parameter from correlation coefficient and time consistency 
variable for node 13 with respect to node 12. (a) Node 13 correlation with node 12, (b) time 
consistency variable of node 12, and (c) belief parameter. 


and smoothes the data using its design and architecture. However, there are some instances such 
as the outlier falling within the range specified, when TOAD is unable to suppress the outliers. 
When Figures 13.8c, 13.9c and 13.14 are compared, we observe the zones where the belief 
parameter range is high, the data cleaning is achieved in a better fashion than the zones where the 
belief parameter is low. If the belief parameter attains high values, then even the outliers that lie 
within the distance-based range can be identified and suppressed as shown in the Figure 13.14. 


13.8 Summary 


Due to their pervasive nature and ability to characterize an environment, WSN s have begin to 
influence the day-to-day activity of human society. Asa result, massive amounts of data reflecting the 
behavior and dynamism of the sensed environment is gathered and processed. The end application 
utilizes this data and takes an appropriate action. In case the data gathered is corrupted, missed, or 
inconsistent with the actual reality, the application errs in making the right decision. 

Data cleaning in WSN arises as in demand research topic which makes sure that the data 
gathered from WSN is consistent, full, and clean. Although data cleaning serves a critical role 
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Figure 13.13 Clean data after smoothing for node 13. 
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Figure 13.14 Outlier detection and rectification for node 11. 


in determining the effectiveness of WSN, until today there is no generic model present in the 
literature that performs well in the varying and challenging environmental conditions. This chapter 
is an attempt to summarize different present day data cleaning mechanisms in WSN and discuss its 
pros and cons. Also an indepth description of a data cleaning mechanism TOAD which uses time 
of arrival of data in conjunction with validation of spatiotemporal associations within the sensor 
nodes is presented. The novelty of TOAD when compared to the other data cleaning mechanisms 
is that it accommodates the changes in the spatiotemporal associations among the nodes of WSN 
and uses it to clean the gathered data. 
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14.1 Introduction 


This chapter introduces a methodology for sensor stream reduction in wireless sensor networks. This 
methodology allows the integration of techniques and tools in sensor stream reduction applications. 
The main hypothesis to use the proposed methodology is as follows: 


An appropriate methodology enables an efficient prototype of new wireless sensor 
networks applications. This prototype considers the project, analysis, implementation, 
test, and deployment phases. The applications can involve hardware, software, and 
networking functions aspects. 


Considering a sensor stream reduction application, a basic methodology is composed of four 
phases: characterization, reduction tools, robustness, and conception. These phases are illustrated 
in Figure 14.1. 


329 


330 m Intelligent Sensor Networks 


Reduction Specific 
architecture evaluations 


Characterization Reduction Robustness Conception 
support 


Figure 14.1 Steps to develop a sensor reduction application. 


An appropriate methodology must provide (1) a conceptual model to help the reduction 
application design, that is, the characterization phase; (2) a reduction application programming 
interface (reduction-API) that can be used in software or hardware, that is, the reduction tools 
phase; (3) a conceptual model to allow an adequate evaluation of the reduction strategies, that is, the 
robustness phase; and (4) a friendly user tool to test, develop, and deploy the reduction strategies, 
that is, the conception phase. A brief description of each phase is presented in the following; 


Characterization: This phase provides the requirement list of sensor stream reduction appli- 
cations and it is applied in the following elements: (1) reduction architecture that can be 
constantly redefined and used in different scenarios; (2) data sensor characterization uses 
models that describe the stream sensed. This characterization is important to identify the 
data behavior allowing the development of specific and efficient reduction algorithms; and 
(3) application requirements allow to list the application requirements to be considered in 
the reduction algorithms project, for example, the quality of service (QoS) requirements 
considered to project a specific reduction algorithms. 

Reduction tools: This phase provides support to reduction application development and it is 
applied in the following elements: (1) reduction API contains all reduction algorithms available 
and it can be combined with network mechanisms or directly in applications; (2) infrastructure 
represents the network mechanisms that can be combined or used with some reduction 
strategy, for example, clustering based on data sensed; and (3) applications that need to 
execute an explicit reduction strategy, for example, user query applications—in this case, the 
sensor nodes need to store a reduced metadata used to answer the initial query. 

Robustness: This phase provides models that allow the algorithms validation and it is applied 
in the following elements: (1) specific evaluations are used when restricted models are 
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considered to represent a specific data sensed. These evaluations can be applied to obtain a 
better classification among different algorithms considering a specific application; (2) network 
optimization is necessary when the requirements have to be optimized. In this case, different 
optimization problems can be investigated to enable the network or algorithm optimization; 
and (3) methodology of validation provides the definition of conceptual models that com- 
bine the evaluation of reduction algorithms with optimization models considering the users, 
infrastructure, or application requirements. 

Conception: This phase integrates the simulation and deployment mechanisms and it is applied 
in the following elements: (1) development tools enable an easy and fast application proto- 
types; (2) simulation incorporates the characterized sensor stream, reduction algorithms, and 
infrastructure mechanisms to the simulators; and (3) deployment strategy must be used to 
allow the solution conception in real environments. 


This methodology is an important aspect to be considered during the reduction application 
development since other applications or solutions do not consider the relation between “data and 
infrastructure” and neither do they assess adequately the quality of the reduction. 


14.2 Sensor Stream Characterization 


Wireless sensor network applications, generally, have different configurations, data types, and 
requirements. Thus, it is important to consider a conceptual model to assist the network managers 
in the specification of application requirements. In our case, this model will help the sensor stream 
reduction applications. 

The basic element of characterization is the reduction architecture proposed by Aquino [1]. This 
architecture describes a conceptual model that considers general data applications, for example, 
temperature, pressure or luminosity monitoring. The data reduction process, in this conceptual 
model, consists of the sensed data characterization, that is, how the data can be collected and 
represented, and the application specification, that is, what application requirements have to be 
considered, for example, real-time deadlines, energy consumption, or QoS guarantee. 

The other elements related to characterization model of reduction applications are discussed in 
the following. 


Data sensor characterization: To process the data, it is important to know the behavior of 
phenomenon monitored and how it is performed by the environment samples. Considering 
this element, the main question is 


How is the monitored phenomenon characterized and represented in a space-temporal 
domain? 


Based on this question the following assumptions can be considered: 
m The phenomenon is modeled via statistical approaches. 
m Using statistical models that describe the monitored data, more efficient and specific 
algorithms for each phenomenon can be proposed. 
A previous knowledge about this question can be seen in Frery et al. [2], which proposes a 
representation for sensing field used to characterize only one instant of the space-temporal 
monitored domain. 
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Application requirements. Other important sets of requirements are determined by reduction 
application. These requirements—energy consumption, delay, and data quality—allow an 
efficient data monitoring. Considering this element, the main question is 


How are the application requirements identified and evaluated to allow an efficient 
sensor stream reduction? 


Based on this question the following assumptions can be considered: 
m The model must provide a tabulation of requirements applications. With this, it is 
possible to identify “how and where” we can use the reduction algorithms. 
m Mathematical models have to be proposed in order to describe the reduction impact 
considered for each, or for a group, of application requirements. 
A previous knowledge about this question can be seen in Aquino et al. [3,4], which shows 
a tabulation in a real-time application. In this work, a mathematical model to identify the 
appropriate amount of data to be propagated through the sink is proposed. Moreover, a 
data-centric routing algorithm is presented to ensure the data delivery at the time. 


For instance, consider the general architecture proposed by Aquino [1]. This architecture shows 
a brief instance of characterization phase applied to wireless sensor network scenario. The way the 
streams are reduced depends on the moment that the reduction is going to be performed, that is, 
during sensing or routing streams. 

Figure 14.2 [1] shows that the input streams must have been originated by the phenomena (sens- 
ing stream) or sent to the sensor node by another node (routing stream). This illustration represents 
the specific reduction architecture element described previously and depicted in Figure 14.1. 

The sensing reduction is recommended when the sensor device gets an excessive number of 
samples, and it cannot be dynamically calibrated to deal with more data than it currently deals 
with. The routing reduction, in contrast, is performed when the network does not support the 
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Figure 14.2 General sensor stream architecture. 
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amount of data being transmitted. This representation defines the data sensor characterization 
element described previously and depicted in Figure 14.1. 

For instance, if the application has some requirements regarding the amount of data supported 
by each sensor node, data stream reduction can avoid uncontrolled data loss while guaranteeing 
the application requirements. Note that, the sensing stream arrives in the application layer, while 
the routing stream arrives in the network layer. This identification represents the application 
requirements element described previously and depicted in Figure 14.1. 

The data reduction process, represented in Figure 14.2, is described as follows [1]: when data 
arrive in the network layer, the network packets first need to be unpacked, separating the data stream 
from the header (that may contain some specific information/ restriction of the application). Once it 
is unpacked, it is sent to the application layer to be processed in the same way as that of the sensing 
streams. At the same time, stream information is given to a cross-layer (labeled in Figure 14.2 
as “stream information”) being responsible to make the interface between the application and 
network layers. The “stream information,” highlighted in Figure 14.2, is responsible for choosing 
which reduction algorithm should be executed and its parameters. The information stored in this 
cross-layer includes the following: 


Feedback: Data received from other sensor nodes in order to perform the reduction calibration 
in an online fashion. 

Application information: Data received from the network layer when the stream is unpacked. 

Data stream type: Data received from the application layer after the data stream is classified. 

Reduction parameters: Data given to the application layer guiding it to perform an appropriated 
reduction, considering the “application information” and the “data stream type.” 

Reduction information: Data received from the application layer after the reduction. 

New application information: Data given to the network layer for packing the reduced 
stream out. 


When data streams arrive to the application layer, they first have to be classified according to the 
number of variables that they monitor. In this context, data streams can be univariate or multivariate. 
Univariate streams are represented by a set of values read by a unique type of sensor, for example, 
a sensor node that monitors only environmental temperature. On the other hand, multivariate 
streams are represented by a set of values coming from different sensors of the same sensor node, 
for example, a node that monitors temperature, pressure, and humidity simultaneously, or by a set 
of measurements coming from the same sensor type located in different sensor nodes, for example, 
a node that processes data from different nodes monitoring only temperature. This classification is 
important because the data reduction process itself depends directly on the stream type. 

After the stream type is known, we have to choose an appropriated stream reduction algorithm 
to effectively perform the reduction. There are various types of data stream reduction methods, 
such as online samples, histograms built, and sketches [5,7,9]. The reduction algorithms available 
in our API will be described in Section 14.3. At the end the reduced data stream is obtained; if the 
stream was being routed, it is passed back to the network layer, which packs the stream and any 
information gathered from the cross-layer and sends it to the sink. 

In order to illustrate the execution of this specific architecture consider Figure 14.3. 

This figure depicts two specific problems: the first one considers a general sensor stream appli- 
cation where a scheduled reduction has to be performed in the source node, that is, it receives data 
about a phenomena for a certain time and then sends it to the sink; and the second one addresses 
a real-time application where the reduction is performed during routing. 
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Figure 14.3 General architecture. 


14.3 Reduction Tools 


A group of reduction solutions must be available to help the application development. Some 
examples of these solutions are the reduction API, the infrastructure network combined with 
reduction techniques, and reduction strategies applied in specific applications. These solutions can 
consider hardware or software implementation. 

The basic element of reduction support is the reduction API available in the reduction architecture 
proposed by Aquino [1]. The API algorithms consider sampling, sketch, and wavelets stream 
techniques and they can be applied for univariate or multivariate data stream [6,8,10,11]. Other 
algorithms can be easily integrated to the API. The general idea is to allow the use of reduction 
API combined with elements of infrastructure network, for example, routing or density control. 
In addition, the API is available to applications, for example, query processing, network manager, 
and software reconfiguration. 

The elements related to reduction support are discussed in the following. 


Reduction API. Although the reduction API is already available in the methodology, the 
investigation of new algorithms is always necessary. Considering this element, the main 
question is 


Is it possible to improve the data reduction algorithms considering the data 
representativeness and saving the network resources? 


Based on this question, the following assumptions can be considered: 
m By using data reduction algorithms it is possible to save network resources keeping the 
data representativeness. 
m Considering the query processing applications, we can use the data sketched to provide 
an approximated answer to some query. The data sketched is used instead of sensed 
data due to the low cost to store the data in the sensor node. 
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A previous knowledge about this question can be seen in Aquino et al. [3,4,8], which present 
a set of reduction algorithms based on data stream techniques. The results show that the 
assumptions described previously can be considered for this algorithm class. 

Infrastructure: The network infrastructure uses the data sensed to improve itself. It can be used 
as the sampled or sketched data. Considering this element, the main question is 


Is it possible to use the data reduction to assist the network infrastructure? 


Based on this question the following assumption can be considered: The data can be reduced 
during the data routing based on some network requirements, for example, energy consump- 
tion, packet delay, and data quality. In this way, the reduction is performed to achieve these 
requirements. 
A previous knowledge about this question can be seen in Aquino and Nakamura [4], which 

shows a data-centric routing algorithm for real-time applications. 

Applications: As expected the reduction strategies can be used directly in the application layer. 
Some applications that use the reduction directly are listed as follows: 

m Quality of service: Among the QoS parameters we can include the data quality parame- 
ter. With this, the application has to guarantee a minimum data quality specified by the 
application. The data are reduced considering a determined data representativeness. 
Another aspect is the use of reduction to achieve some infrastructure parameters, for 
example, energy, delay, or packet loss. These parameters are degraded by huge data. 
The reduction can be used to improve the achievement of these parameters. 

m Real-time applications: The real-time applications in sensor networks, generally, are 
soft real-time applications. This occurs because the environment is not controlled and 
the applications use approximated methods to meet the required deadlines. Thus, we 
can use the routing combined with the data reduction algorithms to achieve soft dead- 
lines. A previous knowledge about these kinds of applications can be seen in Aquino 
et al. [3,4,10]. 


The main aspect of this phase is the reduction API element. For instance, the wavelet, sampling, 
and sketch-based algorithm will be presented. 


14.3.1 Wavelet-Based Algorithm 


As described in Aquino et al. [8], the wavelet transform of a function f(x) is a two-dimensional 
function y(s, 7). The variables s and T represent the new dimensions, scale, and translation, 
respectively. 

The wavelet transform is composed by functions $(2) and w(z) in which the admissibility 
condition | p(z) dt = 0 holds. This condition states that the wavelet has a compact support, since 
the biggest part of its value is restricted to a finite interval, which means that it has an exact zero 
value outside of this interval. Besides, 1p; ,(¢) = 222! — k), for the dilated and translated versions 
of 1p(z). Both these cases characterize the wavelet spacial localization property. 

Another property is the wavelet smoothness. Consider the expansion of y (s, T), around T = 0, 
in a Taylor series of n order, 


as af 
¥60) = = Lio a) dt + O(n +1) |, 
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where f”) is the pth derivative of f and O(n + 1) represents the rest of the expansion. A wavelet 
w(t) has L null moments if 


00 


p(t) dt =0 
to0<%<L. If 
My = ¿plo dt 
we have 
1 FO) l} | f” O) n+l 4 n+2 
| O! Mos +4 = Ms + O(s%t2) 


The vanishing moments are 
M,(0,/) = Ofori = 0,1,...,L— 1 


where L is the number of moments of the wavelet. From the admissibility condition, we have 
that the Oth moment, Mo, is equal to zero. If the other M, moments are zero, then y(s, T) will 
converge to a smooth function f (t). So, if f (£) is described by a polynomial function of degree up 
to (L — 1), the term O(s”*7) will be zero and small values will appear as a linear combination of 
Y in the function y (s, T). 

The regularity degree and decreasing rate of wavelet transforms are related to its number of null 
moments. This property is important to infer the approximation properties in the multi-resolution 
space. When a wavelet has various null moments, there will be coefficients with low values. In 
regions where f (x) is a smooth function, the wavelet coefficients with thin scales in f (x) are null. 

The periodic wavelet transforms are applied to the limited interval of functions. In order to apply 
the periodic transform, it is necessary to consider that the limits of target function are repeated. 
With the goal ofavoiding this condition, the coiflets basis has the null moments property in its scale 
function. The coiflets basis is an extension of the Daubechies ones having the following properties: 


fpd =1 
fxb@de=0, 1=0,1,...,L-1 
fxd@)de=0, 1=0,1,...,L-1 


Due to these properties, Y and p have null moments. The function p is smoother and more 
symmetric than the p considered in Daubechies family. Moreover, this function is a better approx- 
imation of polynomial functions having interpolation properties. Considering 2Z the number of 
coiflets moments and f (x) in [p, q], we have 


q q 
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Let f(x) be a polynomial function of order p < 2L — 1. Then, there is g(x), with the same 
degree, so that 


fe) = Y gD bs,c(x) 


This property states that, during the transformation, some terms can be obtained directly or 
sampled considering the following approximation error: 


= O(2'4), (14.1) 


| (x) — Y ¿bs (e) 


This error has the 5 Dirac property J fc) dx = f (0) only if it has infinite null moments. Due 
to this property, the coefficients in y(s, T) are sparse representations of f(x) and, consequently, 
only few coefficients are necessary to approximate f(x). With this, the scale coefficients in Ys can 
be approximated by sampling of f(x), so that 


y(s, T) = f(2- ©) + oO) 


Considering the Lth order coiflets basis, it is possible to define a fast algorithm using the 
low error property, when the function f(¢) is a smooth function. Each element in V; [+] can be 
approximated by 


Vl] = 2-2 (2’2) 
and each approximated wavelet coefficient, W,[£] at scale 2’, is computed as 


Wile] =>) 8121 mV ile] 


The pseudocode of the wavelet-based algorithm can be seen in Algorithm 14.1 and its analysis 
of complexity is presented in the following text [8]. In this algorithm, / and g are the discrete forms 
of tp and @, respectively; V, represents the reduced data, in which j represents the resolution, that 
is, the sampling rate scale, for example, 2/, 4, and so forth. 

If M represents the number of decomposition level, the total number of operations in the 
data vector with size |V| has a number of operations of order O(M |V| log| V|). The result of 
wavelet transform is the signal decomposition in different subspaces, or sub-bands, with different 
resolutions in time and frequency. Considering the temporal series, they are decomposed in other 
series that compose the original one. 

The main aspects of this sensor stream algorithm can be summarized as follows: (1) the stream 
item considered is the sensor data buffered V; (2) the online behavior of the algorithm considers 
one pass in V but the information of the oldest processing is not used; and (3) considering the 
data representativeness as the most important aspect in our applications, the approximation rate is 
calculated from Equation 14.1. 


14.3.2 Sampling Algorithm 


Aquino et al. [6] present a sample-based algorithm that aims to keep the data quality and the 
sequence of sensor stream. This algorithm provides a solution that allows the balance between best 
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Algorithm 14.1: Wavelet-based algorithm 
Data: V — original data stream 
Data: /, g — Coiflets basis filters 
Result: V; — sampled data stream 


1 Generate filters h(1) and g(ż); 
2 for: — 0 to (|V|/2 — 1) do 
3 u<2t+1; 

4 Vi, = gi Varo 

5 for n < 1 to (|| — 1) do 
6 u<=u-= l; 

7 if u < 0 then 

8 | u< |V|- 1; 

9 end 

10 Vie $ Vin + Snt1 Vitis 
11 end 

12 Viera) E Vas 

13 end 


data quality and network requirements. The sample size can vary, but it must be representative to 
attend the data similarity requirement. 

The sampling algorithm has two versions: random [6] and central [4]. The random one considers 
the data histogram and chooses the samples randomly in each histogram class. The central one 
chooses the central elements of each histogram class. The sampling algorithm can be divided into 
the following steps: 


Step 1: Build a histogram of the sensor stream. 

Step 2 (random): Create a sample based on the histogram obtained in Step 1. To create such 
sample, we randomly choose the elements of each histogram class, taking into account the 
sample size and the class frequencies of the histogram. Thus, the resulting sample will be 
represented by the same histogram. 

Step 2 (central): Create a sample based on the histogram obtained in Step 1. To create such 
sample, we choose the central elements of each histogram class, taking into account the 
sample size and the class frequencies of the histogram. Thus, the resulting sample will be 
represented by the same histogram. 

Step 3: Sort the data sample according to its order in the original data. 


These steps are illustrated in Figure 14.4. The original sensor stream is composed of |V| 
elements. The histogram of the sensor streaming is built in Step 1. A minor histogram is built 
in Step 2, which has the sample size required, and it keeps the same frequencies of the original 
histogram. Finally, the minor built histogram is reordered to keep the data sequence in Step 3. 

An execution example of sample and central algorithms and their operations is depicted in 
Figure 14.5 [4]. In the random sampling (a), we have a “stream in V” with 100 elements, 
|V |— 50% of V is randomly chosen (this choice is performed in each histogram class), and 
then a “stream out V ” is generated with |V | = 50. In the central sampling (b), we have again a 
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Figure 14.5 Sampling algorithm examples. (a) Random sampling and (b) central sampling. 


“stream in V” with 100 elements, |V | 50% of V is choice considering the central histogram 
class elements, and then a “stream out V ” is generated with |V | = 50. 

The pseudocode of the sampling algorithm is given in Algorithm 14.2. 

In Algorithm 14.2, we have two possibilities for the execution of lines 9 and 13 that represent 
the choice of samples to compose the sampled data stream V . In the random version, in both lines 
we have 


index <— Random( pr, pr + n,)s 


where the Random function returns some integer number between [pr, pr + n,]. In the central 
version, in line 9 we have 


index — pr + Î(ne — n,)/2] 


and in line 13 we have 


index — index + 1. 
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Algorithm 14.2: Sampling algorithm 
Data: V — original data stream 
Data: |V | — sample size 
Result: V — sampled data stream 
1 Sort(V) {based on data values}; 
2 lg — “Histogram classes”; 
3 pr < 0 {first index of the first histogram class}; 
4 ne < 0 {number of elements, for each histogram class}; 
5 k <0; 
6 for i < 0 to |V| — 1 do 


7 if V[í] > V[pr] + & ori =|V| — 1 then 

8 n, < Înc |V |/|V\] {number of elements, for each column in V }; 
9 index <— “Index choice following step 2”; 

10 for j — 0 ton, do 

1 V [k] — V [index]; 

12 k<k+1; 

13 index <— “Index choice following step 2”; 
14 end 

15 ne 0; 

16 Per 

17 end 

18 ne — ne + l; 

19 end 


20 Sort(V ) {based on arrival order}; 


Analyzing the Algorithm 14.2 we have 


Line 1: Executes in O(| V| log |V|). 

Lines 2-5: Correspond to the initialization of variables. 

Lines 10-14: Define the inner loop that determines the number of elements of each histogram 
class of the resulting sample. Consider H,, as the number of classes of the histograms. The 
runtime of inner loop is O(|V |), where in line 11 2, = |V | o A, = 1, that is, we would 
have a single class in the histogram of the sample with |V | elements to be covered. 

Lines 6-19: Define the outer loop where the input of data are read and the sample elements 
are chosen. Hen is the number of histogram classes. Before line 7 is accepted, we execute the 
outer loop 7, times, which corresponds to the counting of the number of elements of a class 
of the original histogram (line 18). After the condition in line 7 is accepted, the outer loop is 
executed 7, times and the condition in this line is accepted just H’,, times. With this, we run 
Hen (ne + n) for the outer loop. Since |V| = Hen ne and |V | = Hen n we have a runtime 
for the outer loop of O(|V| + |V |). 

Line 20: Executes in O(|V | log |V |). 


Thus, the overall complexity of the sampling algorithm is 


O(VI log|V]) + O(VI+I1V |) + OUV | log |V |) = O(1V] log |V1), 
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since |V | < |V|. The space complexity is O(|V|+|V |) = O(|V]), since we store the input data V 
and the resulting sample V . Since each node sends its sample toward the sink, the communication 
complexity is O(|V | D), where D is the largest route (in hops) of the network. 


14.3.3 Sketch Algorithm 


Aquino et al. [6] present a sketch-based algorithm that aims to keep the frequency of the data values 
without losses, by using a constant packet size. With this information, the data can be generated 
artificially in the sink node. However, the sketch solution looses the sequence of the sensor stream. 
The sketch algorithm can be divided into the following steps: 


Step 1: Order the data and identify the minimum and maximum values in the sensor stream. 
Step 2: Build the data out, only with the histogram frequencies. 
Step 3: Mount the sketch stream, with the data out and the information about the histogram. 


The execution of algorithm is depicted in Figure 14.6. The original sensor stream is composed 
of |V| elements. The sensor stream is sorted, and the sketch information is acquired in Step 1. The 
histogram frequencies are built in Step 2, where |V | is the number of columns in the histogram. 
The sketch stream, with the frequencies and sketch information, is created in Step 3. 

The pseudocode of the sketch-based algorithm is given in Algorithm 14.3. 

Analyzing Algorithm 14.3 we have 


Line 1: Executes in O(|V| log |V|). 
Lines 2-6: Correspond to the initialization of variables. 
Lines 7-15: Define the loop for the histogram construction and it is executed in O(| V|). 


Thus, the overall time complexity is O(|V| log |V|) + OVD = O(V| log|V]). The space 
complexity is O(|V|+|V |) = O(|V]) if we store the original data stream and the resulting sketch. 
Since every source node sends its sketch stream toward the sink, the communication complexity is 
O(V | D), where D is the largest route (in hops) in the network. 


14.4 Robustness Evaluations 


Once the data are reduced, it is necessary to guarantee its representativeness for each application. 
Considering the infrastructure operations, some mathematical models are used to provide a good 
approximation of the solutions presented. 


Step 1 Step 2 Step 3 
eee | Vi, Va Va | cane 


| 
Stream V | Stream sorted | | Sketch V’ 
| | 


Figure 14.6 Sketch algorithm steps. 
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Algorithm 14.3: Sketch algorithm 


Co 0 N A WV ek WY NR = 


10 


Data: V — original data stream 
Result: V — sketched data stream 
Sort(V); 
lg < “Histogram classes”; 
IV | < [VV] — V0) /41; 
pr — VIO] {first index of the first histogram class}; 
c — 0 {counter}; 
index — 0; 
for i — 0 to |V| — 1 do 
if V[i] > pr + ķ or i = |V| — 1 then 
v [index] — c; 
index — index + |; 
c < 0; 
pr < Vii; 
end 
ecsc+l; 


end 


The basic element of robustness is the specific evaluations. For all scenarios considered by 


Aquino [1], specific evaluations for data quality were performed. These evaluations are required, 
because when the data are reduced, it is necessary to identify the error generated. Statistical 
techniques are used to identify the robustness of reduction algorithms. Furthermore, it is important 
to consider other aspects, for example, mathematical models to optimize some infrastructure or 
application parameters. 


The other elements related to robustness are discussed as follows: 


Network optimization: Wireless sensor networks applications always are interested in minimizing 
the network parameters. When the data reduction is used, the data quality must be maximized. 
For each network parameter, it is important to determine mathematically the ideal amount 
of data to be propagated. Considering this element the main question is 


How are the reduced data quality and the network parameters economy ensured ? 


Based on this question the following assumption can be considered: Optimization models 
can be used to allow the proposition of distributed heuristics for each application. 
A previous knowledge about this question, can be seen in Ruela et al. [11,12]. These works 
consider exact methods, as benders decomposition, and heuristics, as evolutionary algorithms, 
to tackle the problem. These works do not consider the data reduction as an optimization 
parameter. 

Methodology of validation: Since the data reduction is performed, it is important to identify the 
error, for each input data and each reduction technique. Considering this element the main 
question is 


Which is the best approach, a specific or a general strategy, to estimate the data 
quality? 
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Based on this question the following assumption can be considered: Assuming that we have a 
good data characterization, the specific strategy is more appropriated. Otherwise, the general 
strategy is more suitable. 

A previous knowledge about this question can be seen in Aquino [1], which presents some 
strategies based on statistical methods that assess the data representativeness considering the 
data distribution and data values. Besides, the work proposed by Frery et al. [2] presents a 
representativeness evaluation based on data reconstruction. 


For instance, consider the specific evaluation presented in Aquino [6]. This specific evaluation 
considers two analyses: (1) the distribution approximation between the original and sampled 
streams; and (2) the discrepancy of the values in the sampled streams. 

The distribution approximation could be identified by using the Kolmogorov-Smirnov test 
(K-S test) [13,14]. This test evaluates if two samples have similar distributions, and it is not 
restricted to samples following a normal distribution. The K-S test is described as follows: 


1. Build the cumulative distribution Fn of V and V using the same class. 
2. Calculate the differences accumulated for each point and consider the largest ones (Dmax). 
3. Calculate the critical value 


Dee = yy AVI + IV D/IVILV | 


where y is a tabulated value and it represents the significance level. 
4. The samples follow the same distribution if 


Dmax < Derit 


Consider Figure 14.7 that shows the comparison between the Fn distributions, with |V| = 256 
and |V | = (log, |V|,|V|/2}, where V C V. In both cases, through K-S test, V follows the same 
distribution of V. 

To evaluate the discrepancy of the values in the sampled streams, that is, if they still represent the 
original stream, the relative absolute error is calculated. The absolute value of the largest distance 
between the average of the original data and the lower or higher confidence interval values (95%) 
of the sampled data average is calculated. The average is V and the confidence interval values over 
V isIC = [Vinf's Usup]. The evaluation steps are described as follows: 


Cumulative distribution function - | V| vs. log |V| Cumulative distribution function — |V] vs. | V|/2 
= 4 = 4 
W] cs 
S S 
x x 
Ry 3 A Ry 34 
e e 
oct ol 


T T T T T T T T T T T T 
0.40 045 0.50 0.55 0.60 0.65 0.70 0.40 0.45 0.50 0.55 0.60 0.65 0.70 
(a) x (b) x 


Figure 14.7 Accumulated distribution function. (a) Log of data and (b) half of data. 
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1. Calculate the average of reduced values and original ones, they are respectively V and V . 
2. Calculate the confidence interval JC with confidence of 95% for V . 
3. Calculate the relative absolute error 


e= max{|Yjnf i VI, |Vsup — VI). 


14.5 Conception by Simulations 


The basic element of conception phase is the development tools. Lins et al. [15,16] propose a tool 
for the development of monitoring applications for wireless sensor networks. This tool allows the 
fast prototype of data reduction applications. 

It is necessary some improvements in the simulation tools and the proposition of adequated 
methodologies to assess the application deployment in real scenarios. The other elements related 
to robustness are discussed as follows: 


Simulation: Several solutions for sensor networks are assessed by simulations, for example, Net- 
work Simulator* and Sinalgo.' The reduction solutions must be available in the methodology 
through these simulators. In this way, it is possible to integrate new solutions and, con- 
sequently, to test and to validate them. The simulators must be increased considering, for 
example, sensed data generators, data traffic generators, and specific simulation traces to allow 
a better result analysis. Considering only the reduction algorithms, it is important to provide 
a test tool to assess the data quality separately. 

Deployment. Considering the TinyOS operational system,* it is possible to develop a basic 
infrastructure of NesC, allowing an easy NesC-based reduction application. Both, the present 
solutions and the new ones, can be tested in a real sensor node. Moreover, the development 
tool proposed by Lins et al. [15,16] could consider the reduction algorithms and solutions 
allowing the development of reduction applications based on NesC. 


The main aspect of this phase is the simulation element. Some simulation results and analyses 
of the wavelet-, sampling-, and sketch-based algorithms are presented in the following text. 


14.5.1 Wavelets Algorithm Simulation 


Aquino et al. [8] evaluate the impact of the data stream wavelet algorithm © to reduce the 
data sensed V. Considering a previous characterization, two scenarios are considered: with and 
without an event. Figure 14.8 illustrates these scenarios. To simulate the event, we cause a random 
perturbation in V so that the values change drastically. 

Initially, the wavelet transform with j = 2 is considered, using only the V, values in which the 
samples are performed in 2? steps, and it is based on coiflets filters and |V | = |V|/4 [8]. In order 
to compare this strategy, a simple static sampling is used with the same steps, that is, the sampling 
considers 27 static steps instead of the coiflets filters. Figure 14.9 shows the results when there is no 
external event. The results were sampled of data shown in Figure 14.8a and the proposed strategy 


3 http://nsnam.isi.edu/nsnam/index.php/Main_Page 
t http://dcg.ethz.ch/projects/sinalgo/ 


http://www.tinyos.net/ 
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Figure 14.8 Input data for wavelets algorithm. (a) Without events and (b) with events. 
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Figure 14.9 Sampling without event presence. The reduction is 1/4 of original data. 


„where 2:/2 is the normalization 


is represented by a continuous line that indicates | Yi | x |2 "V 


factor at p; ¿ (4) = 22YpO — E. 

In a different way, as showed in Figure 14.10, when we consider the presence of some event 
(Figure 14.8b), the wavelet-based sampling is able to detect it. This occurs because the coiflets 
transform is used here to detect the changes in the sampling. It is important to highlight that the 
error in both cases, Figures 14.9 and 14.10 can be obtained from Equation 14.1. 


14.5.2 Sampling and Sketch Algorithms Simulations 


Aquino et al. [6] simulate the sampling and sketch algorithms. The simulations are based on the 
following considerations: 


m The evaluation is performed through simulations and uses the NS-2 (Network Simulator 2) 
version 2.33. Each simulated scenario was executed with 33 random scenes. At the end, for 
each scenario we plot the average value with 95% of confidence interval. 
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Figure 14.10 Sampling with event presence. The reduction is 1/4 of original data. 


m A tree-based routing algorithm called EF-Tree is used [17]. The density is controlled and 
all nodes have the same hardware configuration. To analyze only the application, the tree is 
built just once before the traffic starts. 

m The data streams used by the nodes are always the same, following a normal distribution, 
where the values are between [0.0; 1.0], and the periodicity of generation is 60 s. The size of 
the data packet is 20 bytes. For larger samples, these packets are fragmented by the sources 
and reassembled at the reception. 

m The stream size is varied and the application and the network behavior is analyzed by using 
sample size of |V|, |V|/2, and log, |V|. 


Figure 14.11 [6] shows the energy consumption when the sensor stream varies. We observe, in 
the sampling solution, that when the sample size is diminished the consumed energy diminishes 
accordingly. The sketch solution follows the sample log n result. This occurs because the packet 
size is constant and close to the sample, that is, a log n packet size. 

The sample of log and the sketch have their best performance in all cases, and the energy 
consumption does not vary when the sample size increases. In the sample of log n, this occurs 
because the packet size is increased only when we increase the sensor stream size (256, 512, 1024, 
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Figure 14.11 Energy network behavior. 
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Figure 14.12 K-S data behavior. 


2048). In the sketch case the used packet size is always constant. The other results (samples of 
1/2 and n) have a worst performance because the packet size is increased proportionally when the 
sensor streaming size is increased. 

Now, consider the impact of solution by evaluating the data quality [6]. This evaluation is only 
for the sampling solution, because this solution loses information in its process, and therefore it is 
important to evaluate its impact on the data quality. In the sketch case, all data can be generated 
artificially when it arrives in the sink node, and, therefore the losses are not identified when the data 
tests are applied. The only impact generated by the sketch solution is the loss of the data sequence, 
which is not evaluated in this work. 

Figure 14.12 shows the similarity between the original and sampled stream distributions. The 
difference between them is called K-S-diff 

The results show that when the sample size is decreased, the K-S-diff increases. Because 
the data streams are generated between [0.0; 1.0], K-S-diff =20% for log sample sizes, and 
K-S-di = 10% for n/2 sample sizes. In all cases, the error is constant, since the data loss is small. 
The higher error occurs when we use a minor sample size; however, the data similarity is kept. 


14.6 Summary 


This chapter addresses some sensor stream reduction issues. A methodology suitable for sensor 
stream applications is presented. This methodology considers the following phases: 


Characterization: This phase provides the requirements list of sensor stream reduction 
applications. It comprises, the following elements: reduction architecture, data sensor 
characterization and application requirements. 

Reduction tools: This phase provides support to reduction applications development. It 
comprises the following elements: reduction API, infrastructure, and applications. 

Robustness: This phase provides models that allow the algorithms validation. It comprises 
the following elements: specific evaluations, network optimization, and methodology of 
validation. 

Conception: This phase integrates the simulation and deployment mechanisms. It comprises the 
following elements: development tools, simulation, and deployment. 
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15.1 Introduction 


In this chapter, we discuss the application of a new compression technique called compressive 
sensing (CS) in wireless sensor networks (WSNs). The objective of a WSN we assume in this 
chapter is to collect information about events occurring in a region of interest. This WSN consists 
of a large number of wireless sensor nodes and a central fusion center (FC). The sensor nodes are 
spatially distributed over the said region to acquire physical signals such as sound, temperature, 
wind speed, pressure, and seismic vibrations. After sensing, they transmit the measured signals to 
the FC. In this chapter, we focus on the role of the FC, which is to recover the transmitted signals 
in their original waveforms for further processing. By doing so, the FC can produce a global picture 
that illustrates the event occurring in the sensed region. Each sensor uses its onboard battery for 
sensing activities and makes reports to FC via wireless transmissions. Thus, limited power at the 
sensor nodes is the key problem to be resolved in the said WSN. 

CS is a signal acquisition and compression framework recently developed in the field of signal 
processing and information theory [1,2]. Donoho [1] says that “The Shannon—Nyquist sampling 
rate may lead to too many samples; probably not all of them are necessary to reconstruct the given 
signal. Therefore, compression may become necessary prior to storage or transmission.” According 
to Baraniuk [3], CS provides a new method of acquiring compressible signals at a rate significantly 
below the Nyquist rate. This method employs nonadaptive linear projections that preserve the 
signal’s structure; the compressed signal is then reconstructed from these projections using an 
optimization process. There are two tutorial articles good for further reading on CS [3,4] published 
in the IEEE Signal Processing Magazine in 2007 and 2008, respectively. 

Our aim in this chapter is to determine whether the CS can be used as a useful framework for 
the aforementioned WSN to compress and acquire signals and save transmittal and computational 
power at the sensor node. This CS-based signal acquisition and compression are done by a simple 
linear projection at each sensor node. Then, each sensor transmits the compressed samples to the 
FC; the FC, which collects the compressed signals from the sensors, jointly reconstructs the signal 
in polynomial time using a signal recovery algorithm. Illustrating this process in detail throughout 
this chapter, we check to see if CS can become an effective, efficient strategy to be employed in 
WSNs, especially for those with low-quality, inexpensive sensors. 

In this chapter, as we assume a scenario in which a WSN is used for signal acquisition, we 
intend to pay some effort in modeling correlation among the signals acquired from the sensors. We 
discuss a few signal projection methods suggested in the literature that are known to give a good 
signal recovery performance from the compressed measurements. We also investigate a couple of 
well-known signal recovery algorithms such as the orthogonal matching pursuit (OMP) (greedy 
approach) [13,14] and the primal-dual interior point method (PDIP) (gradient-type approach) 
[5]. Finally, we simulate the considered WSN system and examine how the presence of signal 
correlation can be exploited in the CS recovery routine and help reduce the amount of signal 
samples to be transmitted at the sensor node. 


15.2 Compressed Sensing: What Is It? 


In a conventional communication system, an analog-to-digital converter based on the Shannon— 
Nyquist sampling theorem is used to convert analog signals to digital signals. The theorem says 


Compressive Sensing and Its Application in Wireless Sensor Networks m 353 


Conventional compression 


Save the 
sampled signal 


Signal 
sensing 


Analog to digital 
converter 


Data Compressed 
samples 


compression 


pe 
Compressire Compressed 
sensing samples 


Compressive sensing 


Figure 15.1 Conventional compression and compressive sensing. 


that if a signal is sampled at a rate twice, or higher, the maximum frequency of the signal, the 
original signal can be exactly recovered from the samples. Once the sampled signals are obtained 
over a fixed duration of time, a conventional compression scheme can be used to compress them. 
Because the sampled signals often have substantial redundancy, compression is possible. Several 
compression schemes follow this approach, for example, the MP3 and JPEG formats for audio or 
image data. However, conventional compression in a digital system is sometimes inefficient because 
it requires unnecessary signal processing stages, for example, retaining all of the sampled signals in 
one location before data compression. According to Donoho [1], the CS framework, as shown in 
Figure 15.1, can bypass these intermediate steps and thus provides a light-weight signal acquisition 
apparatus that is suitable for those sensor nodes in our WSN. 

The CS provides a direct method that acquires compressed samples without going through 
the intermediate stages of conventional compression. Thus, CS provides a much simpler signal 
acquisition solution. In addition, the CS provides several recovery routines that the original signal 
can be regenerated perfectly from the compressed samples. 


15.2.1 Background 


Let a real-valued column vector s be a signal to be acquired. Let it be represented by 


s= x (15.1) 
where x and s € R”, and x is also a real-valued column vector. The matrix e R”*” is an 
orthonormal basis, i.e, 4 = T — [,, the identity matrix of size R”*”. The signal s is called 


k-sparse if it can be represented as a linear combination of only & columns of, i.e., only the & 
components of the vector x are nonzero as represented in Equation 15.2: 


n 
s= > xAb;, where 1; is a column vector of (15.2) 
i=1 


A signal is called compressible if it has only a few significant (large in magnitude) components and 
a greater number of insignificant (close to zero) components. The compressive measurements y 
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(compressed samples) are obtained via linear projections as follows: 


where the measurement vector is y € R”, with m < n, and the measurement matrix A € 
R”*”. Our goal is to recover x from the measurement vector y. We note that Equation 15.3 
is an underdetermined system because it has fewer equations than unknowns; thus, it does not 
have a unique solution in general. However, the theory of CS asserts that, if the vector x is 
sufficiently sparse, an underdetermined system is guaranteed with high probability to have a unique 
solution. 

In this section, we discuss the basics of CS in more detail. 


1. k-sparse signal x in orthonormal basis 


The &-sparse signal, s in Equation 15.1, has Æ nonzero components in x. The matrix is, again, 
an orthonormal basis, i.e, T T = [,,, the identity matrix of size R”*”. 


2. Measurement vector y and underdetermined system 


The sensing matrices and A in Equation 15.3, are m by n. Note that m is less than n and thus 
Equation 15.3 is an underdetermined system of equations. When m is still good enough for signal 
recovery, a compression effect exists. A good signal recovery with m < n in this underdetermined 
system of equation is possible with the additional information that the signal is &-sparse. While 
we address the detail in signal recovery problem in Section 15.2.2, we may consider a simpler case 
for now. Suppose x is k-sparse and the locations of the k nonzero elements are known. Then, we 
can form a simplified equation by deleting all those columns and elements corresponding to the 
zero-elements, as follows: 


y =Akxk (15.4) 


where K € {1,2,...,7} is the support set, which is a collection of indices corresponding to the 
nonzero elements of x. Note that the support set K can be any size—k subset of the full index 
set, {1,2,3,..., n}. Equation 15.4 has the unique solution xx if the columns of Ax are linearly 
independent. The solution can be found using 


xk = (ALA) | ALY (15.5) 
Thus, if the support set k can be found, the problem is easy to solve provided the columns are 
linearly independent. 


3. Incoherence condition 


The incoherence condition is that the rows of should be incoherent to the columns of . If the 
rows of are coherent to the columns of , the matrix A cannot be a good sensing matrix. In the 
extreme case, we can show a matrix A having m rows of that are the first m columns of 


A= = oe (15.6) 
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If A of Equation 15.6 is used as sensing matrix, the compressed measurement vector y captures 
only the first m elements of the vector x, and the rest of the information contained in x is 
completely lost. 


4. Designing a sensing matrix 


One choice for designing a sensing matrix is Gaussian. Under this choice, the sensing matrix is 
designed as a Gaussian, i.e., matrix elements are independent and identically distributed Gaussian 
samples. This choice is deemed good since a Gaussian sensing matrix satisfies the incoherence 
condition with high probability for any choice of orthonormal basis. This randomly generated 
matrix acts as a random projection operator on the signal vector x. Such a random projection matrix 
need not depend on specific knowledge about the source signals. Moreover, random projections 
have the following advantages in the application to sensor networks [6]: 


1. Universal incoherence: Random matrices can be combined with all conventional sparsity 
basis , and, with high probability, sparse signals can be recovered by an Lı minimum 
algorithms from the measurements y. 

2. Data independence: The construction of a random matrix does not depend on any prior 
knowledge of the data. Therefore, given an explicit random number generator, only the 
sensors and the FC are required to agree on a single random seed for generating the same 
random matrices of any dimension. 

3. Robustness: Transmission of randomly projected coefficients is robust to packet loss in the 
network. Even if part of the elements in measurement y is lost, the receiver can still recover 
the sparse signal, at the cost of lower accuracy. 


15.2.2 Lo, Lı, and L} Norms 


In CS, a core problem is to find a unique solution for an underdetermined equation. This problem 
is related to the signal reconstruction algorithm, which takes the measurement vector y as an 
input and the &-sparse vector x as an output. To solve an underdetermined problem, we consider 
minimization criteria using different norms such as the 2, L1, and Lo norms. The Lp norm of a 
vector x of length 7 is defined as 


n 1/p 
<= Y al , p>0 (15.7) 
i=1 


Although we can define the Zz and Z; norms as x 7 = (> 7, lx, 2) and 4 = a ls 
respectively, using the definition of L, norm, Lg norm cannot be defined this way. The Lg norm is 
a pseudo-norm that counts the number of nonzero components in a vector as defined by Donoho 
and Elad [7]. Using this definition of norms, we will discuss the minimization problem. 


1. L2 norm minimization 


(L2) =argmin x , subject to y = Ax, where A e R”*”, rank(A) =m 
8 ) y 


= AT (AAT) 'y (15.8) 
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However, this conventional solution yields a non-sparse solution, so it is not appropriate as 
a solution to the CS problem. 
2. Lg norm minimization 


(Lo) Minimize x y subject toy = Ax, where AER”*”, rank(A)=m (15.9) 


The Zo norm of a vector is, by definition, the number of nonzero elements in the vector. 
In the CS literature, it is known that the Zp norm problem can be solved by examining all 


the possible cases. Since this process involves a combinatorial search for all possible J 
support sets, it is an NP-complete problem. Thus, we cannot solve it within polynomial 
time. Therefore, we consider Lı norm minimization as an alternative. 


3. Lı norm minimization 
(Lı) Minimize x , subject to y = Ax, where AE R”*”,rank(A)=m (15.10) 


This Z norm minimization can be considered as a relaxed version of the Lo problem. Fortunately, 
the Lı problem is a convex optimization problem and in fact can be recast as a linear programming 
problem. For example, it can be solved by an interior point method. Many effective algorithms 
have been developed to solve the minimum Z; problem, and it will be considered later in this 
chapter. Here, we aim to study the sufficient conditions under which Equations 15.9 and 15.10 
have unique solutions. We provide a theorem related to this issue. 


Theorem 15.1 Zo/Z1 equivalence condition Let A € R”*” be a matrix with a maximum 
correlation definition u, u (A) = max l(a;, a;) „where a; is the ¿th column vector of A with ¿= 1, 
I=] 


2,..., n and x is a k-sparse signal. Then, if k < 5 (1 + (1/p)) is satisfied, then the solution of £; 
coincides with that of Lo [7]. 


15.3 Wireless Sensor Networks 
15.3.1 Network Structure 


We consider a WSN consisting of a large number of wireless sensor nodes and one FC (Figure 
15.2). The wireless sensor nodes are spatially distributed over a region of interest and observe 
physical changes such as those in sound, temperature, pressure, or seismic vibrations. If a specific 
event occurs in a region of distributed sensors, each sensor makes local observations of the physical 
phenomenon as a result of this event taking place. An example of sensor network applications is area 
monitoring to detect forest fires. A network of sensor nodes can be installed in a forest to detect when 
a fire breaks out. The nodes can be equipped with sensors to measure temperature, humidity, and 
the gases produced by fires in trees or vegetation [8]. Other examples include military and security 
applications. Military applications vary from monitoring soldiers in the field to tracking vehicles 
or enemy movement. Sensors attached to soldiers, vehicles, and equipment can gather information 
about their condition and location to help planning activities on the battlefield. Seismic, acoustic, 
and video sensors can be deployed to monitor critical terrain and approach routes; reconnaissance 
of enemy terrain and forces can be carried out [9]. 
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Figure 15.2 Wireless sensor network. 


After sensors observe an event taking place in a distributed region, they convert the sensed 
information into a digital signal and transmit the digitized signal to the FC. Finally, the FC 
assembles the data transmitted by all the sensors and decodes the original information. The 
decoded information at the FC provides a global picture of events occurring in the region of 
interest. Therefore, we assume that the objective of the sensor network is to accurately determine 
and rapidly reconstruct transmitted information and reconstruct the original signal. 

We discuss the resource limitations of WSNs in the next section. 


15.3.2 Resource Limitations in WSNs 


In this section, we describe the assumptions made in the sensor network we are interested in. 
We assume that the sensors are distributed and supposed to communicate with the EC through a 
wireless channel. Because each sensor is an important component of WSN that observes event, they 
should typically be deployed in a large volume over the region ofinterest. Therefore, they are usually 
designed to be inexpensive and small. For that reason, each sensor operates on an onboard battery 
that is not rechargeable at all; thus, for simplicity, the hardware implementation of sensor nodes can 
provide only limited computational performance, bandwidth, and transmission power. As a result 
of limitations on the hardware implementation in sensor nodes, the FC has powerful computational 
performance and plentiful energy, which naturally performs most of the complex computations. 

Under the limited conditions stated earlier for a WSN, CS can substantially reduce the data 
volume to be transmitted at each sensor node. With the new method, it is possible to compress 
the original signal using only O(klog(n/%)) samples without going through many complex signal 
processing steps. These signals can be recovered successfully at the FC. All these are done under 
the CS framework. As a result, the consumption of power for transmission of signal contents at 
each sensor can be significantly reduced thanks to decreased data volume. Further, it should be 
noted that, this data reduction comes without utilizing onboard signal processing units since all 
the intermediate signal processing steps, shown in Figure 15.1, are not needed. Namely, the sensor 
nodes can compress the signal while not spending any power for running complex compression 
algorithms onboard. 


15.3.3 Usefulness of CS in WSNs 


In this section, we provide a brief comparison of using CS and using the conventional compression 
in a WSN. This comparison illustrates why CS could be a useful solution for WSNs. 
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Figure 15.3 Conventional sensor network scheme. 


1. Sensor network scheme with conventional compression 


For a conventional sensor system, suppose that the system designer has decided to gather all the 
uncompressed samples at a single location, say one of the sensors, in order to exploit inter-sensor 
correlation. See diagram shown in Figure 15.3. At the collection point, joint compression can be 
made and compressed information can be sent to the FC. 

This option has a couple drawbacks. First, gathering the samples from all the sensors and jointly 
compressing them cause a transmission delay. Second, a lot of onboard power should be spent at the 
collaboration point. Third, each sensor should be collocated so that the transmitted information 
can be gathered at collaboration location. 

Now, we may suppose that the joint compression is not aimed at and each sensor compresses 
the signal on its own. First, the data reduction effect with this approach will be limited because 
inter-sensor correlation is not exploited at all. The total volume of the independently compressed 
data is much larger than that of jointly compressed data. This may produce a large traffic volume 
in the WSN, and a large amount of transmission power will be wasted from the sensor nodes that 
transmit essentially the same information to the FC. Thus, this is an inefficient strategy as well. 


2. Sensor network scheme with CS 


In contrast to the conventional schemes considered in the previous paragraph, the CS method aims 
to acquire compressed samples directly. If a high-dimensional observation vector x exhibits sparsity 
in a certain domain (by exploiting intra-sensor correlation), CS provides the direct method for 
signal compression as discussed in Figure 15.1. To compress the high-dimensional signal x into a 
low-dimensional signal y, as in Equation 15.3, it uses a simple matrix multiplication with an m x n 
projection matrix Aj, j € {1,2,...,/}, where j is the sensor index, as depicted in Figure 15.4. 

In the CS-based sensor network scheme, each sensor compresses the observed signals using 
a simple linear projection and transmits the compressed samples to the FC. Then, the FC can 
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Figure 15.4 CS sensor network scheme. 


jointly reconstruct the received signals (by exploiting inter-sensor correlation) using one of the CS 
algorithms. Therefore, each sensor does not need to communicate with its neighboring sensors for 
joint compression. Our method is distributed compression without having the sensors to talk to 
each other; only the joint recovery at the FC is needed. Thus, no intermediate stages are required 
that are to gather all of the samples at a single location and carry out compression aiming to 
exploiting inter-sensor correlation. This free of intermediate stages allows us to reduce time delay 
significantly as well. Therefore, if the original data are compressed by CS, each sensor node produces 
much smaller traffic volume that can be transmitted to the FC at a much lower transmission power 
and with a smaller time delay. 


15.4 Wireless Sensor Network System Model 
15.4.1 Multi-Sensor Systems and Observed Signal Properties 


Each sensor can observe only the local part of an entire physical phenomenon, and a certain event 
of interest is measured by one or more sensors. Therefore, the sensed signals are often partially 
correlated. These measured signals have two distinct correlations: intra-sensor correlation and inter- 
sensor correlation. Intra-sensor correlation exists in the signals observed by each sensor. Once a 
high-dimensional sensed signal has a sparse representation in a certain domain, we can reduce its size 
by using CS. This process exploits the intra-sensor correlation. By contrast, inter-sensor correlation 
exists among the signals sensed by different sensors. By exploiting inter-sensor correlation, further 
reduction in transmitted signals can be made. 

These two correlations can be exploited to improve the system performance. As the number of 
sensors in a region becomes dense, each sensor has a strongly correlated signal that is similar to 
that of neighboring sensors. By contrast, if we decrease the density of sensors distributed in a given 
region, the sensed signals will obviously be more weakly correlated with each other. In this section, 
we discuss two strategies for transmitting signals in a multi-sensor CS-based system. One strategy 
uses only intra-sensor correlation and the other uses both types of correlation. We illustrate that 
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Figure 15.5 Intra-sensor correlation scheme. 


CS-based system in WSN exploits the inter-sensor correlation more effectively and simply than 
that of conventional sensor network. 


1. Exploiting only intra-sensor correlation 
In Figure 15.5, each sensor observes the source signal and independently compresses it to 
a low-dimensional signal. After compression, each sensor transmits the compressed signal 
to the FC. Without exploiting inter-sensor correlation between transmitted signals, the 
EC recovers these signals separately. In this case, even if there exists a correlation among 
the sensed signals, because only intra-sensor correlation is exploited, we cannot gain any 
advantages from joint recovery. This method has the following characteristics: 
a. Independent compression and transmission at each sensor 
b. Signal recovery by exploiting only intra-sensor correlation at the FC 

2. Exploiting both intra- and inter-sensor correlations 


Figure 15.6 shows the same process as in situation (1) provided earlier, except that the FC exploits the 
inter-sensor correlation among sensed signals at signal reconstruction stage. In conventional sensor 
network system as shown in Figure 15.3, the sensor nodes communicate with their neighboring 
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Figure 15.6 Intra/inter-sensor correlation scheme. 
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Table 15.1 Synthetic Signal Models According to Correlation Degree 


Correlation Characteristics 


Support Set Element Value 
Correlation Degree | Common | Innovation | Same | Different | Both | Model Name 
Weak O O X O X (Empty) 
$ O O X X O JSM-1 
Strong O X X O X JSM-2 
O X O X X (Empty) 


sensors to take advantage of joint compression by exploiting inter-sensor correlation. However, 
in the CS-based system, a stage for exploiting inter-sensor correlation is achieved at FC. It means 
that if inter-sensor correlation exists within the sensed signals, the FC can exploit it. This is done 
with sensors communicating with the FC but not among the sensors themselves. We refer to 
this communication strategy as the distributed compressive sensing (DCS). Exploitation of inter- 
sensor correlation should be manifested with the reduction of the measurement size m of matrix 
A € R”*”, where y = Ax, required for good single recovery. The characteristics of our DCS sensor 
network are as follows: 


1. Independent compression and transmission at each sensor 
2. Exploitation of inter-sensor signal correlation with the joint recovery scheme at the FC 
3. Variation of the per sensor CS measurements to manipulate the level of signal correlation 


15.4.2 Correlated Signal Models and System Equations 


In this section, we introduce how signals with different degrees of correlation can be generated 
with sparse signal models. Sparse signal is a correlated signal. The degree of sparseness, called the 
sparsity, is proportional to the amount of correlation. More correlated signal means sparser. In 
addition, inter-sensor signal correlation can be modeled by (1) the degree ofoverlaps in the support 
sets of any two sparse signals and (2) the correlation of nonzero signal values. There are a number of 
papers that use interesting signal correlation models [6,1012]. Table 15.1 is useful for identifying 
the two models we take from these papers and use in the subsequent sections. 

Table 15.1 lists the signal models introduced in [10,11]. In those references, the correlation 
signal is referred to as JSM-1 (joint signal model) or /SM-2 depending on the correlation type. In 
JSM-1, all of the signals share exactly the same common nonzero components that have the same 
values, whereas each signal also independently has different nonzero components, which is called 
innovation. Such a signal is expressed as 


Xj = Z; + Zj, j€ (1,2,....]), jis the index of the sensors (15.11) 


where Ze 9 =Kand zj 9 = Kj. Obviously, z, appears in all the signals. It can be recognized as the 
inter-sensor correlation. We note that the intra-sensor correlation is that all of the signals are sparse. 
The jth sensor transmits y; = Ax; to the FC. After all the sensed signals are transmitted to the FC, 
the FC aims to recover all the signals. Because inter-sensor correlation exists in the sensed signals, 
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we can obtain several benefits by using the correlated information in the transmitted signals. For 
ease of explanation, suppose that the WSN contains J sensors and its sensed signal follows JSM-1. 
Then, the FC can exploit both intra- and inter-sensor correlations by solving Equation 15.12 as 
described in the following; 


1. Joint recovery scheme for /SM-1 


The sensed signals from J sensors can be expressed as follows: 


xi = Zz + zı € R” 
x = Zz + z2 € R” 


xy = z% + zj € R” 


where the sparsities of vectors z, and z; are Kand K;, respectively. 
. . J . . J 
The transmitted signal y; can be divided into two parts as follows: 


Yj = Aj(ze +25) = Ajte + Aya; 


If the FC received all the signals transmitted from J sensors, it then concatenates the used sensing 
matrix and received signal using Equation 15.12. Because the common sparsity z. appears only 
once in the equation, the total sparsity is reduced from J x (K + Kj) to K + (J x Kj). In the 
underdetermined problem, low sparsity yields exact reconstruction. We will show the relationship 
between exact reconstruction and sparsity from simulation results in later section. By solving this 
equation, the FC can take advantage of exploiting inter-sensor correlation. 


yı Ay Ay 0 0 a... 0 Ze 
y2 A 0 A 0 „O ZI 
+ Z2 
ys |=] A 0 0 Az 0 : Ba (15.12) 


Lal Layo ooo 2] 3] 


However, if the FC recovers the received signals independently without using any correlation 
information, separate recovery is done. Even if the sensed signals are correlated, separate recovery 
offers no advantages for signal reconstruction because it does not exploit inter-sensor correlation. 


2. Separate recovery scheme for JSM-1 


Even if a common correlated element exists in the sensed signals, separate recovery does not use 
that correlation information. Therefore, the received signals are recovered as follows: 


ra A 0 0 0 
y2 0 A 0 : x2 


X] 
(15.13) 


y] 0 0 0 A xy 
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To solve Equations 15.12 and 15.13, we use the PDIP, which is an Lı minimization algorithm, 
and compare the results of the two types of recovery. Using the comparison results in a later section, 
we can confirm that the measurement size required for perfect reconstruction is smaller for joint 
recovery than for separate recovery. 

Now, we introduce /SM-2, which is simpler than /SM-1. All the signal coefficients are different, 
but their indices for nonzero components are the same. Suppose that there exist two signals, xj 
and x2. The ith coefficient for x; is nonzero if and only if the ith coefficient for x2 is nonzero. 
This property represents inter-sensor correlation, because if we know the support set for x;, then 
we automatically know the support set for x2. 


3. Recovery scheme for /SM-2 


The prior inter-correlation becomes relevant when the number of sensors is more than two. To 
reconstruct the transmitted signals of /SM-2, we can solve the following equation jointly: 


yj = Ax; j€ (1,2,....J) (15.14) 


Like the FC in JSM-I, the FC in /SM-2 can exploit the fact that the support set is shared. By 
solving Equation 15.14 jointly in /SM-2, we obtain several benefits when the FC exploits inter- 
sensor correlation. If we solve this equation separately, but not jointly, it is separate recovery. As 
an algorithm for solving the equation of the /SM-2 signal, we use a simultaneous OMP modified 
from an OMP algorithm in order to demonstrate the benefits when the FC exploits inter-sensor 
correlation. These algorithms are discussed in Section 15.5. 


15.5 Recovery Algorithms 


In this section, we discuss the recovery algorithms used to solve the underdetermined equation. 
The recovery algorithms used in CS can be classified as the greedy type and the gradient type. 

We introduce representative algorithms from these two types, the orthogonal matching pursuit 
(OMP) and the primal-dual interior point method (PDIP), respectively. 


15.5.1 Orthogonal Matching Pursuit (Greedy-Type Algorithm) 


The OMP is a famous greedy-type algorithm [13]. OMP produces a solution within & steps because 
it adds one index to the sparse set A at each iteration. The strategy of OMP is outlined in Tables 
15.2 and 15.3. 


Table 15.2 Inputs and Outputs of OMP Algorithm 


Input Output 


An estimate x in R” for the ideal signal. 
Am x n measurement matrix A 
A set Az containing k elements from (1,...,n) 
A m — dimensional data vector y 
An m — dimensional approximation yz of the data y 
The sparsity level k of the ideal signal 


An m — dimensional residual rp = y — yz 
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Table 15.3 OMP Algorithm 


1. Initialize: 
Let the residual vector be rg = y, the sparse set Ag = {}, and iteration number t = 1. 

2. Find the index At: A+ = arg max leia, aj)|. The a; is the ith column vector of matrix A. 

i=1,...,n 

3. Update set: Ar = A¿_7 U {Az}. 

4. Signal estimate: x; (Ag) =Aly and x; (af) =0, where x; (Az) is the set of elements 
whose indices are corresponding to the sparse set. 

5. Get new residual: Y¿= Ax, re=y— Yt. 

6. Increment t: Increase iteration number t=t + 1, and return to Step 2 if t < k. 


Let us examine the earlier OMP algorithm. In step 2, OMP selects one index that has a dominant 
impact on the residual vector r. Then, in step 3, the selected index is added to the sparse set, and the 
sub matrix AA, is constructed by collecting the column vectors of A corresponding to the indices 
of the sparse set Ay. OMP estimates the signal components corresponding to the indices of the 
sparse set and updates the residual vector by removing the estimated signal components in steps 4 
and 5, respectively. Finally, OMP finishes its procedures when the cardinality of the sparse set is &. 

OMP is a greedy-type algorithm because it selects the one index regarded as the optimal decision 
at each iteration. Thus, its performance is dominated by its ability to find the sparse set exactly. 
If the sparse set is not correctly reconstructed, OMP’s solution could be wrong. Because OMP 
is very easy to understand, a couple of modified algorithms based on OMP have been designed 
and developed. For further information on the OMP algorithm and its modifications, interested 
readers are referred to two papers [14,15]. 

We introduce another greedy-type algorithm based on OMP as an example: simultaneous 
orthogonal matching pursuit (SOMP) [14]. This greedy algorithm has been proposed for treating 
multiple measurement vectors for JSM-2 when the sparse locations of all sensed signals are the 
same. Namely, SOMP algorithm handles multiple measurements y; as an input, when j is the 
index of distributed sensors, j € {1,2,...,/}. In a later section, we use this algorithm to recover 
JSM-2. The pseudo code for SOMP is shown in Tables 15.4 and 15.5. 


15.5.2 Primal-Dual Interior Point Method (Gradient-Type Algorithm) 


The L; minimization in Equation 15.10 can be recast as linear programming. Here we examine this 
relationship. Clearly, the L} minimization problem in Equation 15.10 is not linear programming 
because its cost function is not linear. However, by using a new variable, we can transform it to 


Table 15.4 Inputs and Outputs of SOMP Algorithm 


Input Output 


An estimate x; in R” for the ideal signal. 
Am x n measurement matrix Aj 
A set Az containing k elements from (1,...,n) 
A m-dimensional data vector Yj 
An m-dimensional approximation y; of the data y; 
The sparsity level k of the ideal signal ‘ 


An m-dimensional residual rj x = yj — Yj k 
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Table 15.5 SOMP Algorithm 


1. Initialize: 
Let the residual matrix be rj 9 = y;p. The sparse set Ag = {}, and iteration number t = 1. 


2. Find the index Af: A = arg max $ (rie aya): The aj; is the ith column vector of 
matrix Aj. eae 

3. Update set: Ar = A¿_4 U (Az). 

4. Signal estimate: xjt(At) = Aly; and Xjt (a£) = 0, where Xjt (At) is the set of 
elements whose indices are corresponding to the sparse set. 

5. Get new residual: ¥j¢ = Ajtxjt,  Tjt=Yj¡— jt 

6. Increment t: Increase iteration number t = t + 1, and return to Step 2 ift < k. 


linear programming. Thus, the problem that we want to solve is 


min Uj 


subject to (15.15) 
Vilx O| < u 


Ax=b 


The solution of the earlier equation is equal to the solution of the Z; minimization problem. Many 
approaches to solving Equation 15.15 have been studied and developed. Here, we discuss the PDIP 
method, which is an example of gradient-type algorithms. First, we have the Lagrangian function 
of Equation 15.15, as follows: 


L@AD=[OF jev (A o2Je= 8) +07 (|, y) (15.16) 


e 


where e is the 1 x n identity matrix, 01 is the n x 1 zero vector, 02 is the m x n zero matrix, and 1 

. x 

is the n x 1 vector whose elements are all one, t:= e R?”*! v e R”*!, and A € RY! > 0. 
u 


From the Lagrangian function, we have several KKT conditions, 
0 AT | e —e 4 
[ijle [ele 2 prem 


[ 
| Ss , (15.17) 
t 


where 03 is the 27 x 1 zero vector and 04 is the m x 1 zero vector. The main point of the PDIP is 
to seek the point (t*, A*, v*) that satisfies the earlier KKT conditions. This is achieved by defining 
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a mapping function F(t, A, v) : rO7+mx1 = R@ntm)x! Which is 


e ]+ Lo ele 2 


F(t, A, v) = en ll e —e e = 04 € ROZ) 
—e —e 
[A 0, Jt-b | 
| CL Je <01, A" > 03 (15.18) 
—e —e 


where 04 is the (22 + 1) x 1 zero vector. Now, we would like to find the point (t*, A”, v*) satisfying 
F(t*,A*,v*) = 04. Here, we use a linear approximation method. From the Taylor expansions of 
the function F(t, A, v), we have 


At 
F(t+ At, A + AA, v + Av) © F(t, A, v) + Via mt (t, A, v) | Av (15.19) 
AA 


Thus, solving the earlier equations yields the direction (At, Av, AA). Next, we seek the proper 


step length along the direction that does not violate i 


code for the PDIP algorithm is shown in Table 15.6. 


=| t* < 0; and A* > 03. The pseudo 


Table 15.6 Primal-Dual Interior Point Method Algorithm 


1. Initialize: 
Choose v? e RMX1, 20 > 03, and t? = [x0 1077, where x = Alb, and 


= 
u = 0] + a |x] and iteration number k= 1. (The Al = (aTa) A! is the 


Moore-Penrose pseudo-inverse of A and A? denotes the transpose of A.) 
2. Find the direction vectors (At, Av, AA): 


| = | = eat) roti) | ret): 


3. Find the proper step length: 


IA 


2 
Choose the largest œ satisfying |F (tk +a AK + a vÉ + x) |, 

2 

| (e, XE, vt) | 
4. Update parameters: tk+! =t + aat, vH =vkK + aav, AKH = AK + waa. 

. Update the signal: x*+l = xk + t [1 : n]. 

6. Increment the iteration number k: Increase iteration number k = k +1, and 
E 
2 


a 


> eps. 


return to Step 2 if ly — Ax 
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15.6 Performance Evaluation 


In this section, we investigate the performance of a WSN system that applies CS by using the 
PDIP or the SOMP. We divide this section into four sections that analyze the relationship among 
the number of measurements M, sparsity k depending on the number of sensors J, the degree of 
correlation, and the signal-to-noise ratio (SNR), respectively. To avoid confusion regarding the 
graphs, we define the notations and metrics used in the experiments in Tables 15.7 and 15.8, 
respectively. 


Table 15.7 Notation Used in Experiments 


Notation 

N: The length of the signal x at each sensor 

M: the length of measurement y 

j: Index of the sensors, j e (1,2,...,J) 

y: Signal transmitted from each sensor 

A: Sensing matrix, RM*N, its elements are generated by a Gaussian distribution 
x: Sparse signal on the sensor; its elements also have a Gaussian distribution 

n: Additive white Gaussian noise (AWGN) 


K: Common sparsity number 


Kj : Innovation sparsity number 


Table 15.8 Metrics Used in Experiments 


Metrics 


1. SNR (dB): Signal-to-noise ratio, 
2 
Ax 5 


2 
Moș 


10 log+9 


where 03 is the variance of noise. 
2. MSE: Mean square error, 


a 2 
[xxl 
2 
9 
where 
x is the recovered signal 
x is the original signal 
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The proposed correlation signals, JSM-1 and /SM-2, as described in Table 15.1, will also be 
investigated in terms of various parameters, such as signal length, matrix size, and sparsity number. 
To recover the /SM-1 signal (which includes both common and innovation components, and the 
common component has the same values for every sensor) from received signal y, we use the PDIP 
algorithm. However, to recover the /SM-2 signal (which includes only a common component that 
has different values for every sensor), we use SOMP. It is inappropriate to apply SOMP to the 
JSM-1 signals because there exists the innovation component at every sensed signal x;. Although 
SOMP can identify the common part exactly, confusion may arise regarding the optimal selection 
for the innovation component. Because SOMP selects only one index that has the optimal value 
among the vector elements of length N in every iteration, if the selected index is included in the 
innovation component of only one sensor node, the solution cannot be correct. 

For this reason, we use the SOMP algorithm to recover only the /SM-2 signal. If we use 
SOMP to recover the /SM-1 signal, we should improve the algorithm for finding the innovation 
component. From the results of simulations using those two recovery algorithms, we determined 
the relationship among the sensors, measurement, and amount of correlation in the unknown 
sensor signals. 


15.6.1 Reconstruction Performance as a Function of Sparsity 


Figure 15.7 shows the results when the PDIP algorithm was used to reconstruct signals for JSM-1. 
We increased the common sparsity K of each sensor and the number of sensors J while fixing the 
signal length, the number of measurements, and the innovation sparsity of each sensor at N = 50, 
M = 20, and K; = 3, respectively. The FC concatenates the received signals y;, j = {1, 2, . . . , J} to 
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Figure 15.7 Signal reconstructed using PDIP algorithm for JSM-1. System parameters are N = 
50, M = 20, and innovation sparsity K; = 3. 
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[y 1) y2>--->y71 T and puts the sensing matrices to the integrated one as App;p of Equation 15.20. 
Thus, the equation is ypprp := Apprpzporp, and the number of measurements in this equation 
is Mppip = M x J; it then uses PDIP algorithm to get zpprp from yppyp. The recovered 


signal [Z,,Z1,2Z2,...; 2717 from  [y1,y2..., yzl” was compared with the original 
[Ze Z1, Bove Z7] T in order to calculate the probability of exact reconstruction. 
se Ze 
yı A; A 0 0 z 
y2 A 0 A 
Y PDIP := =| a | i 72 (15.20) 
| : | A : : - 0 : 
yJ Ay 0 0 0 Ay 2] 
AS 
Appir ZPDIP:= 
In this case, even if the original signal x; = ze + z; € R”, j € {1,2,...,/} is not sparse, the signals 


Jj transmitted from sensors can be recovered perfectly at the FC if all the sensors have a small 
number of innovation component K; that corresponds to z;. However, as the number of sensors 
increases, the integrated matrix also becomes large. Consequently, the computation is complex, 
and much time is required to obtain the solution. 

Figure 15.8 illustrates the use of the SOMP algorithm to recover /SM-2. The fixed parameters are 
the signal length NV and measurement size Mof each sensor. To determine the effect of the number 
of sensors and sparsity in the WSN, we increased the sparsity K and the number of sensors J. Because 
the /SM-2 signal has the same sparse location for every sensor, the sparse location can be found by 
using SOMP easily. As the number of sensors increases, the probability of making the optimal deci- 
sion at each iteration is greater. As a result, exact reconstruction is achieved, as shown in Figure 15.8. 
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Figure 15.8 Signal reconstructed using SOMP for JSM-2 for increasing common sparsity K and 
number of sensors J. System parameters: N = 256 and M = 32. 
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In both cases, we notice that the probability of successful reconstruction increases as the number 
of sensors increases, because both algorithms use the prior information that the signals are correlated. 
For example, when we increase only the common sparsity K, we can reconstruct all of the signals by 
only increasing the number of sensors. Interestingly, the curve of Figure 15.7 in /SM-1 experiment 
does not show convergence as the number of sensors increases. On the other hand, that of Figure 
15.8 in /SM-2 experiment converges to M — 1 as the number of sensors increases. These results 
are determined from the ratio of the number of measurement to sparsity (M/K) in compressive 
equation. In the case of Figure 15.7, as the number of sensors increases, the number of measurement 
Mppjp also increases. Thus, as the number of sensors increases, the ratio is also changed. (In our 
experiment, we choose K; = 3, where K; < M. Therefore, the ratio increases as the number of 
sensors increases.) In the case of Figure 15.8, there is no change for the ratio regardless of increasing 
the number of sensors. The varying ratio (M/K) of JSM-1 experiment makes the result about no 
convergence in contrast with that of JSM-2 experiment. 


SUMMARY 15.1 RECONSTRUCTION PERFORMANCE AS A FUNCTION 
OF SPARSITY 


We aim at investigating how the increase in sparsity K for signal at each sensor affects reconstruc- 
tion performance of the joint recovery algorithms, while the signal length NV and the number of 
measurements M are fixed at each sensor. As the common sparsity K of each sensor increases, 
the probability of exact reconstruction decreases. This is obvious. Equation 15.20 is the result of 
JSM-1 model that can be used to represent both common and innovative elements in each sensor 
and allows exploitation of inter-sensor correlation. Thus, as the number of sensors increases, 
the total sparsity and the number of measurements pp p also increase as shown in Equation 
15.20. In /SM-2, the sparsity K and the number of measurements Mper sensor are fixed by 
the formulation in (15.14), regardless of the number of sensors. The varying ratio between the 
number of measurement and sparsity makes the results of Figures 15.7 and 15.8, respectively. 


15.6.2 Relationship between the Number of Sensors and the Number 
of Measurements Required for Exact Reconstruction 


In Figure 15.9, we show the results when we increased the number of measurements and the 
number of sensors while fixing the signal length (V = 50), common component (K = 9), and 
innovation component (K; = 3). As the number of sensors increased, the number of measurements 
required for the probability of exact reconstruction to converge to one decreased. Therefore, if we 
use many sensors to reconstruct the correlated signal, we can reduce the number of measurements, 
which in turn reduces the transmission power at each sensor. However, as Figures 15.9 and 15.10 
show, the decrease in measurement size is limited by the sparsity number (K + 1) in one sensor. 

For /SM-2 signals, reconstruction is similar to that of /SM-1 signals in terms of the effect of 
increasing the number of sensors when the correlated signal is jointly recovered (Figure 15.10, 
solid line). However, if signal reconstruction is performed separately, more measurements per 
sensor are needed as the number of sensors J increases (Figure 15.10, dotted line). Because 
the transmitted signals from each sensor are reconstructed independently, if the probability p 
of successful reconstruction is less than or equal to 1, then the total probability of successful 
reconstruction for all transmitted signals is p/. 
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Figure 15.9 Signal reconstructed using PDIP algorithm for JSM-1. System parameters: N = 50, 
K = 9, and K; = 3. 
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Figure 15.10 Reconstruction using SOMP for JSM-2. System parameters: N = 50 and K = 5. 
Solid line: joint recovery; dotted line: separate recovery. 


372 m Intelligent Sensor Networks 


SUMMARY 15.2 RELATIONSHIP BETWEEN THE NUMBER OF SENSORS AND 
THE NUMBER OF MEASUREMENTS REQUIRED FOR EXACT 
RECONSTRUCTION 


We aim at investigating how the probability of exact reconstruction changes with the number of 
sensors increased. As the number of sensors is increased, the signals FC collects are more inter- 
sensor correlated and the number of measurements per sensor required for exact reconstruction 
decreases. Figures 15.9 and 15.10 show that the original signals can be recovered with high 
probability at the fixed measurement as J —> 00 and the per-sensor measurements required for 
perfect signal recovery converges to K + 1. 


15.6.3 Performance as a Function of SNR 


In this section, we present the system performance of a WSN that uses CS in an additive white 
Gaussian noise (AWGN) channel. As in the other experiment, we used a Gaussian distribution to 
create the sensing matrix Aj, j € {1,2,...,/}, and sparse signal x; and then added AWGN z to 
the measurement y; = Ayx;. At the FC, the received signal y; = y; + n was recovered jointly. We 
increased the number of sensors while fixing the signal length, number of measurements, common 
sparsity, and innovation sparsity at N = 50, M = 20, K = 3, and K; = 2, respectively. In this 
experiment, the SNR is set as follows: 


las 


SNR (signal i io) = 101 
(sign to noise ratio) 0810 Mo? 


where 
Ajx; * is the transmitted signal power at sensor j 
M is the number of measurements 
0? is the noise variance 


To estimate the reconstruction error between the original signal x; and the reconstruction signal 
Xj, we used the mean square error (MSE) as follows: 


a 2 
Mean square error = -yl 


We applied the PDIP algorithm to solve Equation 15.20 for JSM-1 and obtained the solution, 
[Z.,Z1,Z2,..., al”. Because of the effect of noise, the solution [Z.,Z1,Z2,..., 27" does not have 
a sparse solution. Therefore, we chose the largest K + (J x K;) values from among the elements 
of the solution. To compare the recovered signal x; to the original sensed signal x;, we divided the 
concatenated solution [Z,,Z1,Z2,..., 27] by each recovered signal x;. The results are shown in 
Figure 15.11. 

To obtain the results in Figure 15.12, we used the SOMP algorithm for /SM-2 with the same 
processing. In contrast to the PDIP algorithm, the SOMP algorithm first searches the support 
set; therefore, it does not require a step in which the largest K values are chosen from among the 
elements of the solution. However, if the selected support set is wrong, the reconstruction is also 
wrong. Both results, Figures 15.11 and 15.12, show that if we increase the number of sensors, 
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Figure 15.11 Signal reconstructed using PDIP method for JSM-1. System parameters: N = 50, 
M = 20, K = 3, and K; = 2. 
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Figure 15.12 Signal reconstructed using SOMP for JSM-2. System parameters: N = 50, M = 20, 
and K= 5. 
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the MSE is improved and finally converges to zero as the SNR increases. Even if the transmitted 
signals contain much noise, having a large number of sensors to observe the correlated signal in 
the sensed region facilitates the search for the exact solution. In Figure 15.12, when the number of 
sensors is two or three, the MSE does not converge to zero even if the SNR is high. Because the 
SOMP algorithm uses cross-correlation to find the support set (step 2 of Table 15.5), if the rank 
of sensing matrix A is smaller than the number of columns in A, then each column will exhibit 
significant correlation among themselves. Consequently, the SOMP algorithm selects the wrong 
support location. However, this problem can be solved by using a large number of sensors. 


SUMMARY 15.3 PERFORMANCE AS A FUNCTION OF SNR 


We aim to investigate the effect of noise in CS-based WSN. In particular, we experiment how 
MSE decreases as SNR increases. Figures 15.11 and 15.12 show similar results. As the number 
of sensors increases, signals are more correlated. This helps signal recovery. 


15.6.4 Joint versus Separate Recovery Performance as a Function 
of Correlation Degree 


Now, we compare the results of joint recovery and separate recovery (Figure 15.13). In joint 
recovery, if a correlation exists between the signals observed from the distributed sensors, the 
FC can use the correlated information to recover the transmitted signals. In separate recovery, 
correlated information is not used regardless of whether a correlation pattern exists between the 
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Figure 15.13 Joint (solid line) and separate (dotted line) reconstructions using PDIP algorithm 
for JSM-1. System parameters: N = 50 and J = 2. The benefits of joint reconstruction depend on 
the sparsity number K. 
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Figure 15.14 Joint (solid line) and separate (dotted line) reconstructions using SOMP for JSM-2. 
System parameters: N = 50 and J = 2. Joint reconstruction has a higher probability of success 
than separate reconstruction. 


observed signals. In Figure 15.13, solid lines were obtained from joint reconstructions, whereas 
dotted lines are the results of separate reconstructions. 

When we use separate reconstruction, we cannot obtain any benefits from correlated infor- 
mation. However, when we use joint reconstruction, we can reduce the measurement size. For 
example, in Figure 15.14, the required number of measurements is almost 40 (dashed line and 
circles, K = 6) for perfect reconstruction when we use separate reconstruction. On the other 
hand, when we use joint reconstruction, it decreases to around 30 (solid line and circles, K = 6). 
Furthermore, as the common sparsity increases, the performance gap increases. For example, when 
the common sparsity is 9, joint reconstruction has a 90% probability of recovering all the signals at 
M = 30. However, the probability that separate reconstruction can recover all the signals is only 
70%. Figure 15.13 also shows that joint reconstruction is superior to separate reconstruction. For 
example, we need at least 30 measurements for reliable recovery using separate reconstruction. 
However, we merely need at least 25 measurements for reliable recovery using joint reconstruction. 


SUMMARY 15.4 JOINT VERSUS SEPARATE RECOVERY PERFORMANCE AS A 
FUNCTION OF CORRELATION DEGREE 


If a correlation exists between the signals observed from the distributed sensors, and if the FC 
uses the joint recovery, then it can reduce the measurement size required for exact reconstruction 
in comparison with that of the separate recovery. As the degree of correlation increases, the gap 
in the results of two methods (joint recovery and separate recovery) widens as shown in Figures 


15.13 and 15.14. 
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15.7 Summary 


In this chapter, we discussed the application of CS for WSNs. We assumed a WSN consisting 
of spatially distributed sensors and one FC. The sensor nodes take signal samples and pass their 
acquired signal samples to the FC. When the FC receives the transmitted data from the sensor 
nodes, it aims to recover the original signal waveforms for later identification of the events possibly 
occurring in the sensed region (Section 15.3.1). 

We discussed that CS is the possible solution that provides simpler signal acquisition and 
compression. CS is suitable for the WSNs since it allows removal of intermediate stages such as 
sampling the signal and gathering the sampled signals at one collaboration point, which would 
usually be the case in a conventional compression scheme. Using CS, the amount of signal samples 
that need to be transferred to the FC from the sensors can be significantly reduced. This may lead 
to reduction of power consumption at the sensor nodes, which was discussed in Section 15.3.3. 
In summary, each sensor with CS can save power by not needing to run complex compression 
operations onboard and by cutting down signal transmissions. 

Distributed sensors usually observe a single globally occurring event and thus the observed 
signals are often correlated with each other. We considered two types of correlations: intra- and 
inter-sensor signal correlations. We provided the sparse signal models, which encompass both types 
of correlation in Sections 15.4.1 and 15.4.2. 
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Figure 15.15 Summary of CS application in WSN. 
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The FC receives the compressed signals from the sensors. The FC then recovers the original 
signal waveforms from the compressed signals using a CS recovery algorithm. We considered two 
types of algorithms. One is a greedy type, which includes the OMP and the SOMP algorithms, 
discussed in Section 15.5.1. The other is a gradient type for which we used the PDIP method, in 
Section 15.5.2. 

Finally, we presented simulations results in which the CS-based WSN system parameters such 
as the number of measurements, the sparsity, and the signal length were varied. We discussed the 
use of a joint recovery scheme at the FC. A CS recovery algorithm is referred to as the joint recovery 
scheme when it utilizes inter-sensor signal correlation as well. By contrast, when the inter-sensor 
signal correlation is not utilized, it is referred to as the separate recovery scheme. In the joint recovery 
scheme, inter-sensor signal correlation information is incorporated in the formation of recovery 
equation as shown in Equations 15.12 and 15.14. In the separate recovery scheme, a sensor signal 
recovery is done individually and independently from the recovery of other sensor signals. We 
compared the results of the joint recovery with those of the separate recovery scheme. We have 
shown that correlation information can be exploited and the number of measurements needed for 
exact reconstruction can be significantly reduced as shown in Figure 15.14. It means that the traffic 
volume transmitted from the sensors to the FC can decrease significantly without degrading the 
quality of the recovery performance (Section 15.6). 

We have shown that the CS is an efficient and effective signal acquisition and sampling frame- 
work for WSN (Figure 15.15), which can be used to save transmittal and computational power 
significantly at the sensor node. This CS-based signal acquisition and compression scheme is very 
simple, so it is suitable for inexpensive sensors. The number of compressed samples required for 
transmission from each sensor to the FC is significantly small, which makes it perfect for sensors 
whose operational power is drawn from onboard battery. Finally, the joint CS recovery at the FC 
exploits signal correlation and enables DCS. 
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Compressive sensing (CS) is a new paradigm in signal processing and sampling theory. In this 
chapter, we introduce the mathematical foundations of this novel theory and explore its applications 
in wireless sensor networks (WSNs). CS is an important achievement in sampling theory and signal 
processing. It is increasingly being implied in many areas like multimedia, machine learning, medical 
imaging, etc. We focus on the aspects of CS theory that has direct applications in WSNs. We also 
investigate the most well-known implementations of CS theory for data collection in WSNs. 
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16.1 Compression of Signals in Sensory Systems 
16.1.1 Components of Distributed Sensory Systems 


When we look at many sensory systems, we usually observe three different functions. 


1. Sensing: A sensor records the physical value of interest, like light intensity, sound, pressure, 
temperature, etc., and an analog-to-digital converter (ADC) quantizes the recorded value 
and outputs a binary number that reflects the current sensed value. 

2. Transmission: The sensor recordings are transmitted periodically over a communication 
channel. Depending on the technology used for transferring data, this process entails different 
levels of transmission cost. For example, a webcam connected directly to a computer has a 
fast, accurate, and cheap (low power, inexpensive) communication channel for transferring 
data. In several applications, the raw data recorded by sensors are transmitted over a noisy 
wireless channel. For example, in a WSN, a vast amount of data has to be transmitted over 
a noisy wireless channel. Because of collision, congestion, and inevitable errors, some of 
the packets are dropped, causing more and more retransmissions. Such a communication 
channel is too expensive for the sensor nodes (SNs) since each transmission consumes a lot of 
node battery power. When the communication is too expensive or the amount of raw data 
is far more than channel capacity, the raw data must be compressed before transmission to 
utilize the communication line efficiently. 

3. Processing and storage: The raw data collected from the individual sensors are processed 
by a fusion center or stored in a persistent storage mechanism for future processing. The 
data collected from the network can be encoded, encrypted, or compressed. Depending 
on the application of the sensory systems, a combination of compression, encryption, or 
other special encodings is applied. In each case, a suitable decoding, decompression, or 
decryption technique may be applied by the receiver. Note that when the acquired data 
are too large in raw format, a compression technique must be applied before storage or 
forwarding. 


16.1.2 Compressibility of Signals 


Figure 16.1 shows an example of Fourier transform that is performed on signal acquired from a 
temperature sensor. Although the signal itself in time domain does not appear to have a specific 
structure, its Fourier transform is quite compressible. We see that only very few Fourier coefficients 
have a large magnitude and most of other coefficients are quite negligible. One can devise a 
compression technique by keeping only the largest Fourier coefficients and discarding the others. 

To show how efficient transform-coding compression techniques are, we do a simple numerical 
experiment on the time-domain signal in Figure 16.1. We represent the signal as a vector f e RY. 
Assume x € CY is the Fourier transform of f. We select & largest elements of x and set its other 
N — k elements to zeros. Let y be the resulting vector. Now, we do an inverse Fourier transform 
on y and get the vector g, a recovered version of the original signal f. Since the signal is to be 
recovered from incomplete set of information, reconstruction error is inevitable. We measure the 
error by calculating mean square error (MSE). Figure 16.2 illustrates that the recovery error rapidly 
decreases when we increase the number of Fourier measurements. In other words, almost all energy 
of the time-domain signal in Figure 16.1 is concentrated on the few largest Fourier coefficients of 
its frequency-domain transform. 
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Figure 16.1 Compressibility of a natural signal under Fourier transform. 
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Figure 16.2 Reconstruction error decay for a signal compressible in Fourier domain. 


Many other signals acquired from natural phenomena like sound, radiation level, humidity, as 
well as more complex audiovisual signals recorded by cameras and microphones have a very dense 
support on a suitably chosen domain like Fourier, wavelet, or discrete cosine transform (DCT). 
Therefore, we can depict the overall function of a transform coding system in Figure 16.3. 


16.1.3 Compression Techniques 


In general, we see a variety of compression and decompression requirements in sensory systems, 
especially distributed sensor networks like WSNs. Compression can be either lossless or lossy. 
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Figure 16.3 Traditional transform coding compression and decompression system. 


m Lossless compression refers to compression techniques that conserve the whole raw data 
without losing accuracy. Lossless compression is usually used in a common data com- 
pression technique which is widely used in commercial compression applications. Some 
of the most commonly used lossless compression algorithms are run-length encoding, 
Lempel-Ziv-Welch (LZW), and Lempel-Ziv-Markov chain algorithm. These algorithms 
are used in many compression applications and several file formats like graphics interchange 
format, portable network graphics, ZIP, GNU ZIP, etc. 

m Lossy compression, on the other hand, allows unimportant data to be lost for the sake of 
more important data. Most of the lossy compression techniques involve transforming the 
raw data to some other domain like frequency domain or wavelets domain. Therefore, such 
techniques are also called transform coding. Since decades, it has been known that natural 
signals like audio, images, video and signals recorded from other physical phenomena like 
seismic vibrations, radiation level, etc. have a sparse or compressible support in Fourier, 
DCT, and wavelets. An encoder of such natural signals can transform the raw data using a 
suitable transformation and throw out negligible values. A decoder can recover the original 
signal with some small error from fewer number of compressed data items. 


The CS theory discusses lossy compression technique that differs from transform coding in 
measurement acquisition as we explain later in this chapter. First, we study the transform coding 
compression and decompression technique in more detail and point out its inefficiencies. Moreover, 
transform coding and CS are strongly connected topics and having a good knowledge of transform 
coding helps in better understanding of CS and its advantages. 

Encoding part of the compression system illustrated in Figure 16.3 is a complex function that is 
run on a large amount of data. Note that after determining the largest elements of the transformed 
signal, negligible values are simply thrown away. One can ask why should such a large amount of 
data be acquired to obtain a small amount of information. This question is answered by compressive 
sensing theory. It says that under certain conditions, a compressible signal can be recovered from a 
few number of random linear measurements. The aim of CS is to bring the compression techniques 
down to the sampling level. As we see later in this chapter, this is especially important in WSN, as 
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we prefer to compress the data as it is being transported over the network. Random measurement 
makes it easier to implement CS in a distributed manner, hence no centralized decision and control 
is needed. Linearity of the measurement mechanism simplifies hardware and software design for 
the sensor nodes. As we describe later in this chapter, acquiring linear measurements from the 
network involves only multiplication and additions to be done by the nodes and the complexity of 
the calculations remains to its minimum. Moreover, such operations can be done more efficiently 
if a cheap low-power hardware is built in the sensor nodes. 


16.2 Basics of Compressive Sensing 


Throughout this text we may use the terms signal and vector interchangeably. Both refer to discrete 
values of a spatiotemporal phenomenon. For example, a WSN consisting of 1 SNs each of which 
recording r samples in every T seconds interval, produces a discrete spatiotemporal signal (vector) 
f e R”. To begin a formal description of the CS theory, we need some mathematical definitions 
that may apply both to the original signal in time or space domain and its projection on some 
frequency domain. Therefore, we use the notation v as a general vector that may refer to a signal or 
its transformation. The first three definitions are general mathematical definitions that may apply 
to any vector. Definition 16.4 refers to the CS theory in a more specific manner. This clarification 
is required to avoid ambiguity in using the mathematical notation for the signal vectors. 


Definition 16.1 Vector v € RY is said to be S-sparse if v 9=S, i.e., v has only S nonzero 
entries and its all other NV — S entries are zero. 


Definition 16.2 S-sparse vector vs € RY is made from non-sparse vector v € RY by keeping 
S largest entries of v and zeroing its all other N — S entries. 


Definition 16.3 Vector v is said to be compressible if most of its entries are near zero. More 
formally, v € RY is compressible when v — vs 2 is negligible for some S& N. 


These definitions may apply to any vector, which can be a signal vector or its projection on 
some orthonormal basis. In the following sections, we may use Definitions 16.1 through 16.3 both 
to time-domain or space-domain signals or their projections on orthonormal bases like Fourier, 
DCT, wavelet, etc. In any case, we can substitute the vector v in these definitions with the signal 
vector f or its compressive projection x, whichever is discussed in the context. 


Definition 16.4 Signal f is compressible under orthonormal basis when f = x and x is 
compressible. The matrix is a real or complex orthonormal matrix with the basis vectors of as 
its rows. We also say that is a compressive basis for signal f. 


Most signals recorded from natural phenomena are compressible under Fourier, DCT, and the 
family of wavelet transforms. This is the fundamental fact behind every traditional compression 
technique. Audio signals are compressible under Fourier transform. Images are compressible under 
DCT or wavelet. WSN also records a distributed spatiotemporal signal from a natural phenomenon, 
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which can be compactly represented under the family of Fourier or wavelet orthonormal bases. 
CS is distinguished from traditional compression techniques in signal acquisition method as it 
combines compression into the data acquisition layer and tries to recover the original signal from 
fewest possible measurements. This is very advantageous in applications where acquiring individual 
samples is infeasible or too expensive. WSN is an excellent application for CS since acquiring all 
and every single sample from the whole network leads to a large traffic over the capacity of limited 
SNs and significantly limits the WSN lifetime. 


Definition 16.5 Measurement matrix m is an m x N real or complex matrix consisting of 
m < N basis vectors randomly selected from orthonormal measurement basis that produces an 
incomplete measurement vector y € C” such thaty = mf. 


Definition 16.6 Coherence between the measurement basis and the compressive basis is 
denoted by u(d, W) and is equal to i max bs, Y;)| where for each 1 <i, j< N, Qs and 1p;’s 
Sijs 


are basis vectors of -and -domain, respectively, and (-) is the inner product operation. 


Now, we are ready to state the fundamental theorem of CS according to the aforementioned 
definitions. 


Theorem 16.1 [1] Suppose signal f e RY is S-sparse in -domain, i.e., f = x and x is 
S-sparse. We acquire m linear random measurements by randomly selecting m basis vectors of the 
measurement basis . Assume that y € C” represents these incomplete measurements such that 
y= mf where mis the measurement matrix. Then it is possible to recover f exactly from y by 
solving the following convex optimization problem 


x = argmin x 1 subjectto y,=( xQ) forall k=1,...,m. (16.1) 
xe CN 
where py's are the rows of the „m matrix. Recovered signal will be f= z 


In case of real measurement and compressive bases, problem (16.1) can be simplified to a 
linear program [2]. Noiselet [3] measurement matrices involve complex numbers and thus a linear 
program cannot solve problem (16.1). Convex optimization problem (16.1) with complex values 
can be cast to a second-order cone program (SOCP) [4]. 

Accurate signal recovery is possible when the number of measurements follows 


m > C-S-logN - p?(®, Y) (16.2) 


where C > 1 is a small real constant [5]. 

From Equation (16.2), it is clear why sparsity and incoherence are important in CS. To efficiently 
incorporate the CS theory in a specific sampling scenario, we need the measurement and compressive 
bases to be incoherent as maximum as possible to decrease parameter u in Equation 16.2. Moreover, 
compressive basis must be able to effectively compress the signal f to decrease S in Equation 16.2. 
When these two preconditions hold for a certain sampling configuration, it is possible to recover 
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the signal f from m measurements, where m can be much smaller than the dimension of the 
original signal [1]. 

Haar wavelet and noiselet are two orthonormal bases with perfect incoherence and hence are 
of special interest for CS theory [6]. Noiselet as measurement basis ( ) and Haar wavelet as 
compressive basis ( ) have a small constant coherence independent of the dimension of spatial 
signal [7]. Their perfect incoherence makes them very useful in many CS applications. Although 
CS theory suggests Haar wavelet and noiselet as the perfect compressive and measurement pair, in 
practice it may not be plausible to compute noiselet measurements in a WSN unless we embed 
parts of the noiselet transform matrix inside the SNs. Interestingly, random matrices such as a 
Gaussian matrix with independent and identically distributed (i.i.d.) entries from a N(0, 1) have 
low coherence with any fixed orthonormal basis [5]. The elements of such a random matrix can 
be calculated on the fly using a pseudorandom number generator, which is common between SNs 
and the sink. When the normal random generator, at every SN is initialized by the id-number of 
that SN, the sink can also reproduce the measurement matrix exactly at the sink. Note that in this 
case there is no need for a centralized control to update the measurement matrix and the values 
of the measurement matrix are not needed to be stored inside the SN. Therefore, using random 
measurement matrices gives us more flexibility and requires less memory on the SNs. Instead, 
because of slightly more coherence between random measurement matrix and the fixed compressive 
basis, the number of required measurements m will increase according to Equation 16.2. 

CS is also very stable against noisy measurement and can also handle signals that are not strictly 
sparse but compressible. It is a very idealistic condition being able to transform signal f to a strictly 
sparse vector in the  -domain. Instead, f is always transformed into a compressible form with 
many near-zero entries. Candés et al. [8,9] have shown that if x — xs 2< € for some integer 
S < N and a small real constant €, then the recovery error by solving problem (16.1) will be about 
O(e). Similarly, in a noisy environment if the measurement vector is added by an additive white 
Gaussian noise (AWGN) ~ M(0, 07), the recovery error is bounded by O(0?). 


16.3 Implementation of Compressive Sensing for Wireless 
Sensor Networks 


One of the first detailed implementations of CWS is introduced by Luo et al. [10,11] in compressive 
data gathering (CDG) for large-scale WSN. They made a comprehensive comparison between CS 
and traditional compression techniques showing that CS leads to a more efficient and stable signal 
acquisition technique compared to some traditional methods like in-network compression [12] 
and distributed source coding (DCS) [13]. Using overcomplete dictionaries [14] in solving (16.1), 
CDG is also able to tackle abnormal sensor readings and cope with unpredicted events in the 
operational environment of WSN. In their initial attempt, Luo et al. [10] have applied a random 
matrix as their measurement matrix seeded by a unique number broadcast in the network and 
a common pseudorandom number generator embedded in all SNs. Random matrices such as a 
Gaussian matrix with i.i.d. entries from a M(0, 1) have low coherence with any fixed orthonormal 


basis [5]. 


16.3.1 Compressive Wireless Sensing 


Bajwa et al. [15] first proposed an implementation of CS for WSNs, which was based on analog 
data communication. Although the realization of their abstract idea is merely feasible, their model 
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provides a simple technique to acquire compressive measurements from a WSN. Therefore, we 
begin the study of major CS implementations for WSNs first by introducing CWS. Let us assume 

is a measurement basis and „is a measurement matrix that has m rows and N columns. Here, 
N is the total number of SNs and m is the required number of compressive measurements. m 
can be determined by Equation 16.2 according to the compressibility level of the spatial signal. 
The measurement matrix  „„ is made of m randomly selected basis vectors from the measurement 
basis . When we are using random Gaussian basis, „ can be simply populated by random real 
numbers from a Gaussian distribution with zero mean and unit variance. Either using random 
measurements calculated on the fly, or using embedded measurement vectors, we have to calculate 
the result of „f. Here, f e R is the spatial signal with a dimension equal to the number of 
SNs. The matrix multiplication y = mf is actually a linear measurement process that must be 
performed in a distributed manner. The result will be the vector y € R”, which has a lower 
dimension than the original signal. The CS recovery algorithm then has to recover the original 
vector f from y. 

To understand how CWS works, we consider the simplest case where all SNs can directly 
communicate with the sink. We know that „m has N columns. Suppose each SN embeds one 
column of . When using random measurements, no embedded data are required. The ith 
column of „ can be generated on the fly. The only requirement here is that the entries of 

m can be regenerated at the sink. Therefore, either the columns of are embedded into SNs 
before deployment of the WSN, or SNs produce their corresponding column using a pseudo- 
random Gaussian random number generator, which is known to the sink. The same ,, can 
be then reproduced at the sink using the same algorithm and a predetermined seed. When SNs 
initialize the pseudorandom Gaussian random number generator with their own id, the whole 
sequences of random entries of the measurement matrix can be regenerated at each sampling 
period. 

All SNs record the value of the intended physical phenomena at the same time. This means that 
the nodes have synchronized clocks (we discuss later the disadvantages of the need for synchronized 
clocks). Each SN then multiplies its recorded real value with its own column vector, which is part of 
the measurement matrix m. We set w; = fj; where f; is the value recorded by that SN and q; is 
the ¿th column of m. When all SNs have done the calculation of their own vector multiplication, 
the vector w; will be ready to be transmitted from each SN. If all the SNs transmit their own w; 
at the same time, the result of matrix multiplication y = „mf will be accumulated at the sink. 
Figure 16.4 illustrates this setup. 

CWS assumes that the SNs can be perfectly synchronized such that the measurement vector 
y can be accurately accumulated at the sink. Bajwa et al. [15] have also considered the effect of 
AWGN present during the radio communication. As discussed earlier, the recovery error in the 
presence of a AWGN (0, 0?) will be in the order of 07. 

The accumulation is actually taking place by adding vectors of the same size across the WSN. 
Therefore, it can be done by any order since the addition is a commutative operation. This property 
of CS measurement has a very important advantage. Due to the commutative property of addition, 
accumulation can be done in any order. Therefore, the same approach as discussed earlier can be 
applied to a star topology, chain topology, or more complex tree topologies. As long as the tree 
routing is applied, accumulation can be done over many levels. Thus, we can extend the very simple 
topology in Figure 16.4 to a multi-hop WSN depicted in 16.5. 

The CWS model in its very primary form, which is based on synchronized superposition of 
signals, is infeasible with current hardware of today’s typical SNs. However, it has interesting 
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Figure 16.4 CWS ina WSN with star topology. 


attributes in its abstract form. Note that the accumulation of the vectors can be done on the 
air and without further computation by the SNs. In the star topology, the accumulation takes 
place instantly when all SNs have transmitted their own calculated vector. Today’s common 
wireless medium access control (MAC) protocols for WSNs are all based on detecting and avoiding 
collision. There has been a large body of continuous work for an efficient MAC protocol that faces 
minimum collisions. Now, CWS requires a synchronized collision. We believe that implementing 
a communication layer with perfectly synchronized collisions is even more challenging than making 
a wireless MAC protocol with minimum collisions. This topic can be considered as an open area 
in the application of CS in WSN. There can be certain limited applications where analog CWS 
can be applied instead of using digital communication, which is used in common MAC protocols. 
The abstract model of the CWS prepares us to get familiar with a more practical implementation 
of CS for WSNs which we discuss in the next subsection. 
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Figure 16.5 CWS ina multi-hop WSN with tree topology. 


16.3.2 Compressive Data Gathering 


Luo et al. [10,11] introduced a model the abstract for the abstraction of CWS data aggregation 
as message passing through the WSN. Therefore, we do not discuss the data transfer as analog 
signals any more. Instead, data transmission is modeled as messages that are transferred over a 
digital wireless communication channel. CDG in its heart remains very much like CWS. It only 
considers spatial sampling, and the spatial measurements are acquired periodically. CDG addresses 
also the problem of abnormal sensor readings. Abnormal readings may render the signal not being 
compressible on the compressive basis. Figure 16.6 shows an example of the effect of the abnormal 
samples on the compressibility of a signal. Figure 16.6a shows the original signal, which is sparse 
in DCT as can bee seen in Figure 16.6b. In Figure 16.6c, we see the same signal with only two 
abnormal sensor readings. The value of those two samples is either too high or too low compared 
to the average of the samples. Figure 16.6d shows the DCT transform of the abnormal signal. 
Obviously, the signal contaminated with abnormal values is not compressible under DCT. This 
violates the preconditions required for operation of the traditional CS. Fortunately, there is a 
solution to this problem by using overcomplete dictionaries [14]. 

CDG proposes the use of overcomplete dictionaries to detect abnormal sensor readings. Assume 
d is the spatial signal vector that may have been contaminated with abnormal readings. Remember 
that we assume that the abnormal readings in the signal are sparse, otherwise, it is not possible to 
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Figure 16.6 Effect of abnormal readings on the DCT projection of a signal. (a) original signal, 
(b) DCT transform of original signal, (c) original signal contaminated with abnormal readings, 
and (d) DCT transform of the distorted signal. 


recover a heavily distorted signal by any efficient data acquisition mechanism. We can decompose 
the vector f into two vectors: 


f = fo +d, (16.3) 


where 
fo is the normal signal, which is sparse or compressible in the compressive basis 
d, contains the deviated values of the abnormal readings 


The vector d, is supposed to be sparse since the abnormal sensor readings are rare and sporadic. 
Therefore, d, is sparse in space domain. The original signal vector f can be then represented as the 
linear combination of two sparse signals and we can rewrite Equation 16.3 as 


f= fo+Íx (16.4) 


where xo is the sparse projection of the normal component fọ on the -domain. Vector d, is in 
fact equal to x, since d, is already sparse in the space domain. 7 is the identity matrix and hence 
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d; =x, =/x,. It is possible to project the signal f on a single overcomplete basis. First, we need 

to construct an augmented transformation matrix for an overcomplete system named . Assume 

that the matrix is made by placing the matrices and J next to each other. Formally speaking, 
= [| Z], which has now N rows and 2N columns. 

Donoho et al. [14] have shown the possibility of stable recovery of sparse signal under an 
overcomplete system. They have also proved that their method is effective for measurements 
contaminated with noise. The recovery takes place using a convex optimization similar to the 
typical CS recovery method. The recovery error in the presence of noise would be in the order 
of the additive noise magnitude. Since we are going to search for a solution in an augmented 
dictionary, our target vector has now the dimension of 2N instead of N. Note that f e RY can be 
represented by a sparse projection on the W’ basis: 


f= x, xx] (16.5) 


Here, you see that x has a dimension of 2N and is made by concatenating x, and xo on top of each 
other. 

Now, we apply the CS Theorem 16.1 for this new pair of signal and measurements. The 
measurement vector can be calculated using a random measurement matrix as before. The estimated 
solution vector will be x = [x 27] T The original signal can be recovered by calculating f = x, 
which is expected to have a limited error compared to f. Again, the recovery error depends on the 
number of measurements m and the noise that is present in the communication channel. CDG is 
also able to detect the location of abnormal events in the WSN. First, we need to solve the convex 
optimization problem (16.1) for the augmented system and find the solution $ = [X¿x7]7. The 
nonzero elements in x, determine the position of the abnormal events. 

CDG also provides an analysis of the network capacity when using CS for acquiring the 
distributed spatial signal. Comparison of baseline data transmission in a WSN with uniformly 
random distribution of SNs shows that a network capacity gain of N/m is possible using CDG. 
N is the number of SNs and m is the number of required compressive measurements. m depends 
on the sparsity or compressibility of the -transform of signal vector f. For a S-sparse compress- 
ible signal, roughly m=4S measurements is required to recover the signal with an acceptable 
accuracy [11]. 


16.3.3 Distributed Compressed Sensing 


Wakin et al. in distributed compressed sensing (DCS) [16] give a model of WSN with each node 
having a direct link to the sink. DCS considers not only the spatial correlation of a distributed 
signal but also the temporal correlation over a period of time. Therefore, DCS can be regarded 
as a spatiotemporal sampling technique. The spatiotemporal signal is modeled as a combination 
of several temporal signals. DCS assumes two components are contributing to the spatiotemporal 
signal recorded by the WSN. The first component is the common component, which is sensed 
overall by SNs. For the second component, individual SNs can contribute their own signals, which 
can be sparse. According to this configuration, DCS defines three joint sparsity models (JSMs), 
namely JSM-1, JSM-2, and JSM-3 which we briefly explain later. Suppose there are J SNs each 
of which are producing a temporal signal f; with dimension NV. This means / SNs are producing 
N samples in every sampling period. We assume that there is a fixed -basis for RY on which the 
signals can be projected sparsely. Now, we explain attributes of the three JSM models. 
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JSM-1: Sparse common component + innovations. In this model, all of the temporal signals 
share a common component, which is compressible in -domain while every SN may contribute 
a sparse innovation component: 


LE =z +25 /Ell 2.1 (16.6) 
such that 
zc= Oc Oc 0o=K and z= 8 0, 0 0= Kj (16.7) 


In this formula, zç is the common component that has the K-sparse representation of Oc. Every 
SN has its own innovation z;, which is Kj-sparse in -domain. Note that here is an V x N 
orthonormal matrix. This model can describe of wide range WSN applications. Assume a WSN 
that is deployed in an outdoor environment to record the microclimate data like ambient air 
temperature. Overall, nearly a constant temperature is observed that is close to the average. There 
might be some more structures present in the signal values over a geographical area, but the signal 
is expected to be temporally compressible. In some spots of the operational environment, there 
is a temperature difference because of local events or conditions like shades, flow of water, etc. 
Therefore, the signal of this example can be perfectly modeled by JSM-1. 

JSM-2: Common sparse supports. In this model, signals recorded by all SNs can be projected 
sparsely on a single fixed orthonormal basis, but the coefficients may differ for every SN. Formally 
speaking, this means that 


f= 0, j¡e(12,3,...,J) (16.8) 


where each signal vector O; can be projected only on the same subset of basis vectors Q C 
(1,2,3,..., N} such that [2 | =K. Therefore, all temporal signals recorded by SNs have an / 
sparsity of K, but the amplitude of coefficients may differ for each SN. This model is useful in 
situations where all signals have the same support in frequency domain, but may experience phase 
shifts and attenuations due to the signal propagation. In scenarios like acoustic localizations, it is 
necessary to recover every signal individually. 

JSM-3: Non-sparse common + sparse innovations. This model can be regarded as an extension 
to JSM-1 since it does not require the common signal to be sparse in the -domain; however, the 
innovations by individual SNs are still sparse: 


fi=zc+yz, j€{1,2,...J} (16.9) 


zc= Oc and z= 9 9 p= Kj. (16.10) 
One application of this model can be in a distributed deployment of cameras where the overall 
picture can be presented as a reference picture with some small innovations sensed by different 
cameras. These differences from the base picture depend on the position of each camera. 

Wakin et al. [16] have also proposed an efficient novel algorithm especially developed for joint 
signal recovery. In general, the reconstruction accuracy or recovery probability increases with more 
measurements. For separate signal recovery, the performance decreases when we add more SNs, 
that is, when the value of / increases. This is expected according to the traditional theory of the 
CS. The dimension of the signal increases, but the number of measurements does not increase 
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according to Equation 16.2. On the other hand, according to the numerical experiments done 
Wakin et al., the accurate reconstruction probability increases when more SNs are added to the 
network. By increasing /, we will have more correlation between the SNs. Therefore, with fewer 
measurements the original signal can be recovered accurately. Most interestingly, DCS leads to 
asymptotically optimal decoding in JSM-2 as the number of SNs increases [16]. 


16.3.4 Compressive Sensing over ZigBee Networks 


ZigBee is a high-level wireless communication specification based on the IEEE 802.15.4 standard 
[17]. It is widely used in today’s SN platforms like MicaZ and Telos. As mentioned earlier, none 
of the proposed adaptations of the CS to WSN is perfectly suitable for the typical hardware and 
software platform that is currently being used. CWS requires analog signal superposition and hence 
perfect time synchronization between nodes. Furthermore, it does not yet consider the effect of 
multi-path fading and other propagation noises. CDG is basically very similar to CWS and yet does 
not provide a practical implementation. DCS and JSM models are best suited for star topologies 
where each node can transmit its data directly to the sink. Again, like CWS, a time-synchronized 
signal superposition is preferred for DCS, otherwise time division multiple access methods will 
become very time consuming. 

Caione et al. in [18] proposed a simple but very effective improvement technique for the 
operation of a CS-based signal acquisition for WSNs. In the previous section, we have seen that 
when using CS, the number of data items to be transmitted by each SN is equal to m and hence 
the energy consumption is balanced. This means that for a leaf node that needs to send only one 
value, m transmissions must be performed. We know that most of the nodes in a tree structure 
belong to the lower levels of the tree (levels near to the leaves). Therefore, a majority of nodes are 
sending data much more than the useful information that they produce. Caione et al. introduced 
a hybrid CS method for WSNs in which two operations are done by nodes: 


m Pack and forward (PF): The SN packs its own recorded data along with the data received 
from its children SNs. 

m CS measurement: The SN accumulates its own data and the data that have been even- 
tually received from its children using CS measurement techniques. It then sends the CS 
measurement vector to its parent node. 


All SNs follow two rules throughout acquiring, packing, forward, and accumulation of data: 


m ASN only applies CS measurement accumulation method, when the length of the resulting 
message is less than that when using PF. In other words, each node decides whether to use 
PF or CS depending on the length of the outgoing message. This way, the SNs can minimize 
the energy consumed during radio transmission, which is the major resource bottleneck 
for SNs. 

m When the message received by an SN is a CS measurement, the SN is obligated to apply 
the CS accumulation process. In other words, when the message changes its type from PF 
to CS, then it continues to be in the CS form all the way through the network till it reaches 
the sink. 


These two simple rules and operations help to reduce the overall energy consumption of SNs. 
However, the energy consumption is not balanced any more. In hybrid CS, there is still the jeopardy 
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of network partitioning, since the nodes near the sink may get exhausted and deplete earlier than 
leaf SNs. Xiang et al. in [19] have tried to optimize the aggregation tree of hybrid CS, though they 
do not address the unbalanced energy consumption of intermediate level nodes. 


16.3.5 Further Improvements to Compressive Sensing in Wireless 
Sensor Networks 


The CS theory has found applications in many scientific and industrial areas like magnetic resonance 
imaging, multimedia, genetics, and WSNs. In every area, there are customized algorithms and 
techniques that try to improve the CS performance for a specific data acquisition and recovery 
technique. 

Mahmudimanesh et al. in [20,21] show that it is possible to virtually reorder the samples 
of a spatial signal in a WSN to get a more compressible view of the signal. When the signal is 
more compressible, we can recover it with the same quality from fewer number of measurements, 
or we can recover the signal from a fixed number of measurements while delivering a higher 
reconstruction quality. The proposed algorithm gets the current state of the environment, that is, 
the current spatial signal that is distributively recorded by the WSN. The algorithm then produces 
a permutation of the nodes under which the signal is more compressible when projected on a real 
fixed compressive basis like DCT of Fourier. 

Interestingly, the near-optimal permutation found in one sampling round can be applied to 
next sampling round and an almost equal performance gain is achievable over next few sampling 
intervals. This attribute mostly depends on the dynamics of the operational environment. When 
the spatial signal does not change drastically over a short period of time, it is possible to apply 
the methods proposed in [21]. In most applications of WSNs, especially when the environment 
has not very rapid dynamics, these methods result in better performance. They do not discuss the 
implementation of their method on real-world WSNs. It is a rather general approach that can be 
applied by CWS, CDG, or DCS with JSM. Hybrid CS that has been discussed in the previous 
section can be also merged into this reordering technique. 


16.4 Summary 


In this chapter, we have explained the fundamentals of the emerging theory of CS and its applications 
in WSNs. We have realized that CS provides a very flexible, tunable, resilient, and yet efficient 
sampling method for WSNs. We have listed major advantages of CS over similar transform 
coding or other signal compression methods, which are traditionally used in WSNs. CS can 
guarantee a balanced energy consumption by all of the SNs. This important property avoids 
network partitioning and improves overall resource management of the whole network. It is easily 
implementable into strict hardware or software platform of today’s typical SNs. Its resilience against 
noise and packet loss makes it very suitable for harsh operational environment of WSNs. 

We have reviewed the most important variants of CS that are especially devised for WSNs. In 
recent years, with the progress of the CS theory in WSNs area, more and more realistic applications 
of CS are presented. CWS was one of the first variants of CS for WSNs. Although it was lacking 
consideration of established hardware platforms of SNs, it was a novel idea, which opened a new 
avenue of research in the area of signal acquisition for WSNs. CDG, which was directly based on 
CWS model, enabled detection of unexpected events. Event detection is a crucial requirement of 
many critical WSN applications. 
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DCS with its JSM models has categorized a wide range of applications for CS in WSNs. DCS 
provided an efficient solution to the problem of joint signal recovery. DCS has many applications 
in different distributed sampling scenarios, like WSN with star topology, camera arrays, acoustic 
localization, etc. However, it is not very suitable for multi-hop WSNs. Hybrid CS is a simple and 
yet effective solution for multi-hop WSNs consisting of very resource-limited SNs. It is one of 
the most recent improvements to typical CS implementation in WSNs. The only disadvantage of 
using hybrid CS is the unbalanced energy consumption leading to network partitioning. 

CS has proved itself as a very advantageous method for distributed signal acquisition. Multi- 
hop implementation of CS can be regarded as the next challenge toward realizing an efficient 
distributed sampling method for WSNs. Moreover, further research is required to examine the 
effect of faulty nodes on the performance of CWS, DCS, CDG, and hybrid CS. We believe that a 
fault-tolerant CS-driven sampling and routing method for WSNs can significantly contribute to this 
research area. 
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A cyber physical system (CPS) is a system that tightly combines and coordinates its computational/ 
cyber elements together with physical elements. Wireless sensor communications have been exten- 
sively used to enhance the intelligence of CPS, whereas sensors gather information from the physical 
world and convey it to the central cyber system that further processes the sensor data [1]. For exam- 
ple, in [2] the authors have designed a cyber system employing sensor techniques to monitor the 
algae growth in Lake Tai, China. As stated in the paper, the design comprises of sensors and 
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actuators to monitor the order of severity of the algae bloom and to dispatch salvaging boats. 
Similarly, the authors in [3] have proposed a CPS approach that navigates users in locations with 
potential danger, which takes advantage of the interaction between users and sensors to ensure 
timely safety of the users. 

While sensors are widely deployed in the CPS, sensors are often targeted by the attackers. There 
are many types of physical and cyber attacks possible on a sensor. Physical attacks may range from 
short circuiting the sensors to damaging the sensors such that they present false data. Physical 
attacks can generally be mitigated by deploying proper security measures in the area. Cyber attacks 
generally include software, protocol, algorithm, or network-based attacks, which can be much 
more complicated and harder to defend. In this chapter, we consider attacks on sensors deployed 
in a water supply system and present a framework to detect such attacks. 

The water supply system is equipped with water level sensors, to measure water level, and drifters, 
to measure water velocity. Data collected by these sensors are wirelessly transmitted to a central 
controller that implements a Kalman filter detection system. The Kalman filter generates estimates 
of the sensor data at next time step based on the data from current time step and other neighbor 
sensors. These estimated readings are compared with the actual readings to look for discrepancies. 
Without the loss of generality, the water supply system is modeled using the Saint-Venant models, 
a nonlinear hyperbolic mathematical model. The model is further refined to yield a linear and 
discrete mathematical model such that the Kalman filter can be applied. 


17.1 Introduction 


Recently, the security aspects of the water system drew attentions from the literature. For example, 
vulnerabilities of supervisory control and data acquisition systems used to monitor and control 
the modern day irrigation canal systems are investigated in [4]. The paper has linearized the 
shallow water partial differential equations (PDEs) used to represent water flow in a network of 
canal pools. As shown in [4], the adversary may use the knowledge of the system dynamics to 
develop a deception attack scheme based on switching the PDE parameters and proportional (P) 
boundary control actions, to withdraw water from the pools through offtakes. The authors in 
[5] have analyzed false data injection attacks on control systems using Kalman filter and Linear 
Quadratic Gaussian (LQG) controllers. The paper proves that a system with unstable eigenvector 
can be perfectly attackable and has provided means to derive an attack sequence that can bypass x? 
detectors of the system. 

While being increasingly deployed in CPS such as the water system, sensors are becoming the 
easy targets for attackers to compromise the security of the system. Since these sensors can be in 
remote and dangerous areas, deploying human resources to guard these sensors is not a practical 
approach. Hence, it is important that the cyber system is intelligent enough to detect attacks on the 
sensors. The water system described here is assumed to have two type of sensors: water level sensor 
and velocity sensor. The water level sensors are used to measure the water level from ground, and 
velocity sensors are used to measure the velocity of water at a certain location in the system. The 
system uses Lagrangian drifters equipped with GPS to calculate the velocity of the water. It is stated 
that the use of drifters can closely resemble the system that uses fixed sensors to measure velocity of 
the water flow [6,7]. To study the characteristics of water flow, the Saint-Venant equations have 
been used to model a wide variety of water flow systems like shallow water flow, one-dimensional 
flow, two-dimensional flow, etc. [6,7]. For example, a mathematical model using Saint-Venant 
one-dimensional shallow water equations for water flow problem is discussed in [6]. 
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17.2 Overview of the Framework 


Figure 17.1 shows the framework we consider in this work. The framework consists of a typical 
water system with stationary level sensors and mobile drifters. The drifters sense velocity of the 
stream and are equipped with GPS. The level sensors sense water level. 

The water system is modeled using the Saint-Venant model. The Saint-Venant equations are 
modeled as hyperbolic nonlinear equations. The Saint-Venant equations are not compatible with 
the Kalman filter as the filter is designed for linear systems. So, to combine the hyperbolic water 
system model and the Kalman filter model, the Saint-Venant equations are linearized using Taylor 
series expansion (as described in Section 17.3.3). Furthermore, Kalman filter requires to break 
events into discrete events as it estimates the readings from sensor at next time stamp using data 
from previous time stamps. Hence, we divide time and water channel length into smaller events At 
and Ax, respectively. Lax diffusive scheme is then applied on the linearized Saint-Venant equations 
to discretize the equations, based on which the Kalman filter can estimate the readings from the 
sensors. As shown in Figure 17.1, the periodical readings from the wireless sensors and the output 
generated by the Kalman filter are then fed to a detector. The difference between these two values 
is compared with a threshold to detect if the sensors have been compromised or are at fault. 


17.3 Modeling the Water Supply System 


In this section, the Saint-Venant equations are introduced to model the water system. For a 
steady-state water flow, the hyperbolic and continuous Saint-Venant model is then linearized and 
discretized for its direct application in the Kalman filter (to be discussed in Section 17.4). 


17.3.1 Saint-Venant Model 


Saint-Venant equations are derived from the conservation of mass and momentum [8]. These 
equations are first-order hyperbolic nonlinear PDEs equations and for one-dimensional flow with 
no lateral inflow, these equations can be written as 


0 (17.1) 
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Figure 17.1 The water supply system. (a) physical water system and (b) central system with 
estimator and detector. 
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Q EEA, _ = 
St = () E ee 337) 
for (x,t) € (0, L)XR* (17.2) 


where 
L is the length of the flow (m) 
Q(x, t) = V(x, t)A(x, 2) is the discharge or flow (m3/s) across cross section 
A(x, t) = T(x)H (x,t) - V(x, 2) refers to velocity (m/s) 
A (x, t) refers to water depth (m) 
T (x, t) refers to the free surface width (m) 
Se (x y) is the friction slope 
Sp is the bed slope 
g is the gravitational acceleration (m/ s2) 


These equations can be elaborated [6] in terms of water depth and velocity as 


6H | STHV) _, 


T- 17.3 
ôt bx (7.2) 
ôV ôV ôH 
7 o F g(Sp — Sp) (17.4) 
The friction is empirically modeled by the Manning-Stickler's formula: 
2VIVICT + 2H)3 
_ VIVI +2) (17.5) 


(TH)3 


1 
where m is the Manning’s roughness coefficient (s/m3) 


17.3.2 Steady-State Flow 


There exists a steady-state solution of the Saint-Venant equations under constant boundary condi- 
tions [7]. We denote the variables corresponding to the steady-state condition by adding suffix 0. By 
excluding the term containing 5¢ and expanding Equation 17.3, we obtain the following equation: 


AVo(x) __ Vot) dHo(x) Vol) dT (x) 


= 17.6 
dx Ho(x) dx T(x) dx 176) 
Solving Equations 17.4 and 17.6, we get 
dH, 5, — 5, 
a (17.7) 


dx 1 —F(«)2 


with Fo = Vo/Co, Co = vz Ho. Here, Co is the gravity wave celerity, Fo is the Froude number. 
We assume the flow to be subcritical, that is, Fo < 1 [7]. 
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17.3.3 Linearized Saint-Venant Model 


The linearized Saint-Venant model can be obtained from the steady-state flow characterized by Vo 
and Ho [7]. Let v(x, y) and h(x, y) denote the first-order perturbations in water velocity and water 
level. Then, 


V(x, t) = Volx, t) + v(x, t) (17.8) 


A(x,t) = Ho(x, t) + A(x, t) (17.9) 


The values of H and V are substituted in Equations 17.3 and 17.4 and expanded in Taylor 
series. We use Tọ in place of T to emphasize that it is uniform. As described in [7], neglecting 
higher order terms, a given term f (V, H) of Saint-Venant model can be written as f(V, H) = 
FWVo Ho) + (fv)ov + fido» in which ()o indicates steady-state conditions. The linearized 
Saint-Venant equations can be obtained as follows [6,9]: 


hi + Ho(x)vx + Vo(x)hy + xlv + Bob = 0 (17.10) 
0, + Volos + ghy + vo + n(x)b = 0 (17.11) 


where x(x), B(x), y(x), and n(x) are given by 


x(x) = “ a o (17.12) 
B(x) = 7 ane a q (17.13) 
(x) = 2gm? a = ius) an TO (17.14) 
n(x) = gn am (17.15) 
Ay 


17.3.4 Discretization 


In order to discretize the linear equations generated in Section 17.3.3, we use the Lax diffusive 
scheme [6] as follows. The channel is divided into smaller segments of length Ax and a suitable 
time interval Az is selected. 


tt td 4740 
5z At | 
dv hs + wi) 
_ 17.1 
i (17.17) 
bh HE O +h) (17.18) 
Ss At | 
5h MESA 
= ( i+1 1 1) (17.19) 
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Given (4%, 0) /_¿, we want to compute (4/*1, vft1)!_). Here, I is the total number of segments 
of length Ax. The updated equations for (4;, v;) are 


1 
a = ¿Uta +A) 
Ar k k 
E ga D + Ho DV — 0-1) 


At k k 
— Faz VD + Voci) i1 — Pi) 


At 
= a A T 0101 
2 
At 
- S Bakha + Bib, (17.20) 
ki ed 
Pa Win + 01) 


gAt it k 
= JAn a bi_) 
At k 
= J YA T Vi-1Uj3_-] 
At 
= Mita + nih, (17.21) 


If assume that Ax is very small, then we can write h;_¡ = b; = h;+1 and 0-1 = v; = 041. 
Equations 17.20 and 17.21 will then become 


w= (1- Sau pi) 


At ” 
+ | 0; — a Vi (17.22) 
At 
k+1 = ae LF 
Vi (ni 2 n.) i 
Ar p 
+ [1 — a Vit Yi v; (17.23) 


17.3.5 Discrete Linear State-Space Model 


From the discretized equations in Section 17.3.4, state-space model can be formed as follows: 


x(k + 1) = Ax(k) + Bulk) + w(k) (17.24) 
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where x(k) = (vk, sak vk, bk, ..., 4%)" with the applied control u(£) in the form of discharge 
perturbation at the upstream end uh and the discharge perturbation w(£) at the downstream 
end uf [8]. Here, wg, xp are independent Gaussian random variables, and xp ~ N(0, E) and 


w ~ NO, Q). 


17.4 Detecting Attacks Using Kalman Filter 


In this section, we introduce the Kalman filter [10] technique to obtain estimates for the state-space 
vector x(k) described in Section 17.3.5. Figure 17.2 shows the control system of the Kalman filter 
with the sensor readings or observations from the water supply system namely, y;. And x; denotes 
the output of the control system that is fed to the controller. The observations (y;) are forwarded 
to the central system containing estimator and detector at a regular time interval denoted by Az. At 
each time step Az, the estimator of the system generates estimated readings based on the reading 
of previous time step. These readings are used by the detector to detect the difference between the 
newly observed sensor readings and estimated readings. 


17.4.1 Kalman Filter 


To apply the Kalman filter technique, the observation equation for the preceding system can be 
written as 


Vk = Cxp + 0% (17.25) 


Here, yz = Dé, „IT € R™ is measurement vector collected from the sensors and y* is the 
measurement generated by sensor í at time &. vz is the measurement noise and assumed to be white 
Gaussian noise, which is independent of initial conditions and process noise. 

Kalman filter can then be applied to compute state estimations x, using observations y4. Let the 
mean and covariance of the estimates be defined as follows: 


Zee = El yo. - - -> y] 

Meet = Elio al 

Phk-1 = Xe 

Pek = Xe (17.26) 
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Figure 17.2 Kalman filter. 
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The iterations of Kalman filter can be written as 


Supe = Axe + Bug 

Pee = AP AP +Q 

Ke = Pau CE (CePige_1 Cy + BR 

Pap = Par — KeCPep-1 

Xk = Xi + Kee — Corea) (17.27) 


Initial conditions being xoj-1 = 0, Poj-1 = 2, and K; being Kalman gain. We assume that the 
Kalman gain converges in a few steps and is already in a steady state, then 


pa lim Pee- K = PCT(CPCT + Ry" (17.28) 
—> 00 


The Kalman filter equation can be updated as 
Spy = Ap + Bug + K[ yer — C(Aăp + Bug)] (17.29) 


The residue z¿+1 at time & + 1 is defined as 


A a 
Zk+1 = Yk+1 T Vk+1 hk 
Equivalently, 
A in 
2441 = kr — C(Ax¿ + Bug) (17.30) 
The estimate error ez is defined as 
ds (17.31) 


Equivalently, 


A A 
Ep = Xk+1 T Xk+1 (17.32) 


Substituting the values x,4, and xz) from Equations 17.24, 17.25, and 17.29, we obtain the 
following recursive formula for error calculation: 


epy1 = (A — KCA)ez + (I — KC) wy, — Kez (17.33) 


17.4.2 Attack/Failure Detection 


Since it is assumed that the noises in the system are Gaussian, we use x? detector to compute the 
difference between the observed value from the sensors and the estimated values from the Kalman 
filter as the following: 
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where P is the covariance matrix of zg, the residue. The x? detector compares gg with a certain 
threshold to detect a failure or attack and triggers the alarm for potential attack or failure [11]. 


gp > threshold (17.35) 


where g+ is defined as 


Ek = 6 Sp Yk Xk - - -> Zp Ma oa) (17.36) 


The function g is continuous and T € N is the window size of the detector [11]. 


17.5 Summary 


The framework designed in this chapter is able to detect attacks or failures by computing the 
difference between the observed value from the sensors and the estimated values from the Kalman 
filter. An alarm is triggered if the difference is larger than a given threshold. The threshold also 
accounts for measurement and system errors and helps avoid the false alarm. Besides easily detecting 
the attacker naively stealing water, it can effectively detect if the attacker tampers with the sensor 
readings. However, in [11] the authors have presented an algorithm to attack a system using 
Kalman filter by generating an attack sequence depending on the statistical property of the system. 
To defend against such attacks, the users of the water system can precompute all the unstable 
eigenvectors of the system and recognize the sensors that can be attacked by the attackers. Another 
approach is that the users can deploy redundant sensors in places where sensors are likely to be 
compromised, which can help detect the false data injection. On top of these defense schemes, 
the users can also implement encryption algorithms to further enhance the security of the data 
transferred from sensors to the central system. 


17.A Appendix 


Symbols and Abbreviations Explanation 

CPS Cyber physical system 

V,H Water velocity 

Vo, Ho Static velocity and height 

v,h Perturbation in velocity and height 
g Acceleration due to gravity 

T Width of the water flow 

A Cross-section area (T x H) 

Q Flow per unit time (A x V) 

Sf Friction slope 


(Continued) 
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Symbols and Abbreviations Explanation 
Sb Bed slope 
m Manning-Stickler’s formula 
Co = VgHo Gravity wave celerity 
Fo = Vo/Co Froude number 
x(x), BO), YX), NOD Coefficients for linearized Saint-Venant equations 
n Dimension of state space 
Xk State space with dimension n at time k 
Yk Measurement vector collected from sensors 
Pk Covariance of the estimates 
Kk Kalman gain 
Zk Residue 
ek Error estimate 
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Wireless sensor networks have been proposed for use in many challenging long-term applications 
such as military surveillance, habitat monitoring, infrastructure protection, scientific exploration, 
participatory urban sensing, and home energy management [1-3]. Compared with other high-end 
sensing technologies (e.g., satellite remote sensing), sensor networks are acclaimed to be low cost, 
low profile, and fast to deploy. With rapid advance in fabrication techniques, the constraints of 
computation and memory of the sensor nodes might not be major issues over time. However, 
energy will continue to be the victim of Moore's law, that is, more transistors indicates more power 
consumption. Constrained by the size and cost, existing sensor nodes (e.g., MicaZ, Telos, and 
mPlatform [4]) are equipped with limited power sources. Yet these nodes are expected to support 
long-term applications, such as military surveillance, habitat monitoring, and scientific exploration, 
which require a network life span that can range from a few months to several years. 
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To bridge the growing gap between lifetime requirement of sensor applications and the slow 
progress in battery capacity, it is critical to have an energy-efficient communication stack. In this 
chapter, we introduce reliable and energy-efficient networking protocol designs in wireless sensor 
networks. Specifically, we present the following three types of representative design: (1) low-duty- 
cycle network protocol design; (2) exploring link correlation for energy-efficient network protocol 
design; and (3) energy-efficient opportunistic routing protocol design. 


18.1 Low-Duty-Cycle Network Protocol Design 


Typically, energy in communication can be optimized through (1) physical-layer transmission 
rate scaling [5]; (2) link-layer optimization for better connectivity, reliability, and stability [6]; 
(3) network-layer enhancement for better forwarders and routes [7]; and (4) application-layer 
improvements for both content-agnostic and content-centric data aggregation and inference [8]. 
Although these solutions are highly diversified, they all assume a wireless network where nodes are 
ready to receive packets and focus mainly on the transmission side, a topic of interests for years 
with hundreds of related publications. 

In contrast, wireless networks with intermittent receivers have caught unproportionately little 
attention, despite the known fact that communication energy is consumed mostly for being ready 
for potential incoming packets, a problem commonly referred as idle listening. For example, the 
widely used Chipcon CC2420 radio draws 19.7 mA when receiving or idle listening, which is 
actually larger than 17.4 mA when transmitting. More importantly, packet transmission time is 
usually very small (e.g., <1 ms to transmit a TinyOS packet using a CC2420 radio), while the 
duration of idle listening for reception can be orders of magnitude longer. For example, most 
environmental applications, such as Great Duck Island [9] and Redwood Forest [10], sample the 
environment at relatively low rates (on the order of minutes between samples). With a comparable 
current draw and ~3 to 4 orders of magnitude longer duration waiting for reception, idle listening 
is a major energy drain that accounts for most energy in communication if it is not optimized. 

To reduce the energy lost to idle listening, a low-duty-cycle network is formed by nodes that 
listen to the channel very briefly and shut down their radios most of the time (e.g., 99% or more). At 
any given time, this type of network is actually fragmented (partitioned) and network connectivity 
(topology) becomes intermittent. Uniquely, communication delay in low-duty-cycle networks is 
dominated by sleep latency—the delay time a sender waits for its receiver to wake up. Although low- 
duty-cycle networking is an ideal fit for many long-term unattended sensor applications, research 
has been lacking and predominately focuses on physical-and link-layer designs. To ensure packet 
reception at low-duty-cycle receivers, several pioneer researchers have proposed B-MAC [11], 
X-MAC [12], WiseMAC [13], and the 802.15.4 beacon-enable mode, which successfully reduce 
the amount of idle listening through techniques such as low-power-listening and/or synchronous 
channel polling. These link-layer designs are effective; however, further improvement becomes 
difficult without utilizing information about topology and multi-hop connectivity information 
at the network layer. For example, data reliability is commonly supported by link-layer protocols 
[11-16] through retransmission to a same receiver if previous transmission fails. In a low-duty-cycle 
network, without the network layer rerouting capability, these link-layer protocols have to wait for 
an intended receiver to wake up again, introducing excessive sleep latency in the orders of seconds 
or possibly minutes. 

Motivated by the insufficiency of link-layer designs, researchers proposed to investigate a wide 
spectrum of low-duty-cycle network configurations to support diversified types of applications. 
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(1) The static low-duty-cycle wireless network supports mission-driven applications such as military 
surveillance with a specified network lifetime requirement and a fixed energy budget [17]. For 
example, a military strategic area shall be covered until a stronghold is established in 2 months. 
The duty cycle of this type of network is a fix ratio of the battery lifetime and network lifetime. 
(2) The dynamic low-duty-cycle wireless network supports scientific exploration such as habitat 
study [18] and structural monitoring [19]. Battery-powered sensors are not desirable in these long- 
term unattended applications, because replacing thousands of batteries at hard-to-access locations 
would add significantly to the cost of maintenance. In this type of network, energy harvesting 
techniques are normally employed [20-27] and the duty-cycle availability changes dynamically 
with fluctuating ambient energy [28]. 

In [29], researchers revealed that in low-duty-cycle networks, sleep latency makes traditional 
routing algorithms [30,31] ineffective, especially when the unreliable nature of wireless links is 
seriously considered. The key concept developed in [29] is dynamic forwarding (DSF), which 
utilizes a sequence of receivers to optimally reduce delay in source-to-sink communication. To 
minimize sleep latency, DSF utilizes multiple potential forwarding nodes at each hop. For a given 
sink, each node maintains a sequence of forwarding nodes sorted in the order of the wake-up 
time associated with them. Packet transmission starts with the first node in the sequence. In case 
of failure, retransmission follows the sequence order until the packet is successfully received by 
one of forwarding nodes. The key optimization problem in DSF is how to select a subset of 
forwarding nodes among all eligible nodes. For the first time, researchers in [29] revealed low- 
duty-cycle networks have fundamental different properties from always-awake networks. Notably, 
they found temporary routing loops introduced by DSF is actually beneficial in reducing end- 
to-end delay, a finding that invalidates the deeply rooted belief that network loops are always 
harmful. 

Intended for system-wide dissemination of configurations and code binaries, flooding has been 
investigated extensively in wireless networks [32-34]. However, only little work has been done 
for low-duty-cycle wireless sensor networks in which nodes stay sleeping most of the time and 
wake up asynchronously. In this type of networks, the concept of broadcast has been changed. 
A broadcasting packet can rarely be received by multiple nodes simultaneously, a constraining 
feature making the existing solutions unsuitable. In addition, a whole new level of complexity is 
added when unreliable nature of wireless communication is considered. Researchers try to design 
an approach to let a node make probabilistic forwarding decisions based on the delay distribution 
of receiving nodes. Only early packets are forwarded to achieve shorter flooding delays and reduce 
the level of redundancy. 


18.2 Exploring Link Correlation for Energy-Efficient 
Network Protocol Design 


Wireless sensor networks normally work under 2.4 GHz radio spectrum, which is heavily used 
by lots of other devices, such as microwave ovens, medical diathermy machines, cordless phones, 
bluetooth devices, NFC devices, and wireless access points. To understand how these devices impact 
communication performance between wireless sensor devices, extensive research [7,35—43] has been 
done to measure packet reception quality of individual links in realistic environments. These in 
situ studies have proved that ideal models do not hold well in practice. For example, (1) signal 
propagation is nota fixed function of distance; (2) RF signal strength does not attenuate identically 
in all directions; and (3) link quality does not remain constant over time. If designs of protocols 
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are based on simplified assumptions, their performance is very poor in realistic environments. 
Although empirical studies on link performance are highly diverse, they predominantly focus on 
the analysis of individual links. 

On the other hand, little research has been done to investigate the reception correlation among 
neighboring wireless links, despite the fact that wireless communication essentially occurs in a 
broadcast medium with concurrent receptions. For the sake of simplicity, nowadays many existing 
protocols implicitly assume link independence among concurrent receivers, another simplifying 
assumption that deserves the same level of inspection as the aforementioned ones. As shown in 
recent work [44], ignoring spatial link correlation would potentially hinder further performance 
improvement in wireless communication. In [44], researchers demonstrated that if we assume the 
receptions of a broadcast packet by multiple neighboring nodes are probabilistic independent of each 
other, it is necessary to have per-node acknowledgment (ACK) from all neighbors to achieve reliable 
broadcast [45-48] leading to possibly the ACK implosion problem [49] in high-density wireless 
sensor networks. Similarly, if reception results of all receivers are identical (i.e., strongly correlated), 
the performance gain of opportunistic routing and collective forwarding [50,51] becomes less 
significant because strong spatial link correlation reduces multi-receiver diversity gain [52,53] 
where the probability of successful reception among a cluster of potential receivers is larger than a 
single receiver. 

In [44], researchers conducted empirical study on link correlation. In the experiments, 42 
MICAz nodes were used. The experiments were conducted with multiple randomly generated 
layouts under two scenarios: an open parking lot and an indoor office. In each scenario, the 
sender was placed in the center of the topology, while the other 41 nodes were randomly deployed 
as receivers. The sender broadcasted a packet every 200 ms. Each packet was identified by a 
sequence number. The total number of packets broadcasted was 6000. In both indoor and outdoor 
experiments, researchers discover that if a packet is received by a sensor node with low packet 
reception ratio (PRR), most of the time this packet is also received by the high PRR nodes. 
Figure 18.1 illustrates the first 600 packet receptions of three nodes in indoor experiments. The 
black bands correspond to the packets received at the nodes. Clearly, there exists a strong correlation 
of packet receptions among the neighboring nodes. For example, in Figure 18.1, given the two 
packets (sequence number 282 and 508) received by N22, these two packets were also received 
by N29 and N23. In order to quantify this correlation, researchers define the conditional packet 
reception probability (CPRP) as the probability that a node N; receives a packet M from sender node 
S, given the condition that the packet M is received by another node Nj. 

In [44], researchers use Ps(W,|NV/) to denote CPRP, where N; and N are neighboring receivers 
of the sender S. Clearly, CPRP Ps(W,|N7) can be calculated as the percentage of packets received 
successfully at the node Ny, among the packets that have been received by the node J. In addition, 
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Figure 18.1 Correlation of packet reception among receivers. 
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packet reception probability Ps(V;) can also be calculated as the percentage of successful receptions 
at N, regardless of the reception result at N7. If the receptions of wireless links are independent, 
Ps(N,\ N1) equals Ps(N,). To analyze the CPRP among the pairwise receivers more systematically, 
researchers computed the CPRP for all node pairs with nonzero PRR values. In the indoor 
experiment, there exists 32 nonzero PRR nodes, which generate 32x31 = 496 combinations of 
Ps(N;|N1). Researchers found that Ps(W,|NV/) is not equal to Ps(W),) for about 98.18% of the 
receiver pairs. This demonstrates the existence of link correlation. 

In recent MobiCom work [54], authors demonstrate that link correlation does affect the network 
performance noticeably, but they did not take further steps to utilize such impact for performance 
improvement on flooding protocol design as the researchers did in [44,55]. Existing flooding algo- 
rithms [45,47,49,56] have demonstrated their effectiveness in achieving communication efficiency 
and reliability in wireless sensor networks. Further performance improvement, however, has been 
hampered by the implicit assumption of link independency adopted in previous designs. In other 
words, existing flooding algorithms assume that the receptions of a flooding packet by multiple 
neighboring nodes are probabilistically independent of each other. Under such an assumption, 
it is necessary to have an ACK directly from all neighboring receivers. This is because a node’s 
ACK cannot be used to confirm the reception at its neighboring receivers if link independency is 
assumed. 

However, direct ACKs per receiver may lead to high collision [57,58], congestion [59], and 
possibly the ACK storm problem [49] in wireless networks. To address the problem, researchers 
designed the Collective Flooding (CE) protocol [44] that exploits spatial link correlation for perfor- 
mance improvement. The driving idea behind this is collective ACES. Previously, a sender estimated 
whether a transmission was successful based only on the ACKs from the intended receiver. Instead, 
the mechanism of collective ACKs allows a sender to infer the success of a transmission to a receiver 
based on the ACKs from other receivers by utilizing spatial link correlation among them. 

Specifically in CF, a node is called a covered node if it has already received the flooding packet. 
Covered nodes are responsible for rebroadcasting the packet to uncovered nodes in the network. 
The mechanism of collective ACKs allows a node to extract information about the status of 
its neighboring nodes via receiving or overhearing ACKs from its neighbors. For example, in 
Figure 18.2, suppose that node $ is a covered node while Nı and N3 are uncovered. They are 
within 1-hop communication range from each other. When $ broadcasts, if N receives the packet, 
in traditional flooding protocols without considering the correlation, N. only knows that S is 
covered but still considers N> as uncovered (due to unreliable wireless links) until M; overhears the 
rebroadcast from M2. 

The CF protocol takes a different approach. In CF, every node keeps track of its neighbor’s 
coverage probability. As shown in Figure 18.2, from N]'s viewpoint, a packet from $ serves two 
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Figure 18.2 An example of collective ACK. 
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Ps (No|N) = 100% 


Figure 18.3 Based on Ny’s ACK and Ps(N¡|N2), S knows N? is covered. 


purposes. First, it is a direct ACK, confirming that S is a covered node. Second, it also serves as 
a collective ACK to N, that N3 has a reception probability of Ps(M2|N1). In traditional designs, 
overhearing a packet serves only as a direct ACK that the packet sender (e.g., S in Figure 18.2) 
is covered. In CF, overhearing a packet by NV] can serve as the partial evidence that the packet 
receiver (e.g., Nz in Figure 18.2) is also covered. Such partial evidence can be accumulated in a 
collective manner to reach a certain coverage probability threshold. Once the threshold is achieved, 
N; considers N covered and hence refrains from redundant rebroadcasting. 

Exploring spatial link correlation can greatly reduce the redundant transmission. For the sake 
of clarity, let us consider a hypothetical example shown in Figure 18.3. The two links from node S 
to N] and M} are strongly correlated with 50% PRR. In other words, Ps(N2|N1) = 100%, and the 
link qualities from Nj and N back to S are 10% and 100%, respectively. In traditional flooding 
protocols, the sender S treats the receivers packet receptions as independent. To provide reliable 
broadcasting, S needs to continue transmitting until it receives ACKs or overhears rebroadcast 
from both NV; and M2. Due to the low link quality (10%) from Mz back to S, S might conduct 
many unnecessary retransmissions. In contrast, with the knowledge of spatial link correlation, 
node S can terminate the transmission once it receives ACK from Nj, given the knowledge 
Ps(N2|N1) = 100%. As we can see from the previous simplified example, collective ACKs can 
improve the efficiency of the reliable flooding protocol by utilizing the spatial link correlation. 

Clearly, the previous example can only show potential benefits at the conceptual level. It cannot 
reveal whether such benefit is significant enough in generic practical settings. To evaluate the 
practicality of the design, researchers have implemented it on the TinyOS [60]/MICAz platform in 
nesC [61] and compared CF with Standard Flooding (FLD) and Reliable Broadcast Propagation 
(RBP) [45]. Experiments were conducted in both indoor and outdoor multi-hop environments. For 
example, in an outdoor experiment, 48 MICAz nodes were deployed along a 326 m long bridge. As 
shown in Figure 18.4a, the reliability of RBP8 (maximum retransmit 8 times to uncovered nodes), 
RBP4 (maximum retransmit 4 times), CF, and standard flooding was 99.96%, 97.6%, 99.93%, 
and 61.96%, respectively. While achieving similar reliability as RBP8, CF reduced the number of 
packets transmitted by 31.2% as shown in Figure 18.4b. In addition, Figure 18.4c shows that the 
average dissemination delay of RBP8, RBP4, CF, and standard flooding was 4.46, 3.93, 2.85, and 
2.34 s, respectively. Again, the average delay of CF was 36% less than that of RBP8. 

The main reason for the performance difference is that RBP and standard flooding do not 
use spatial link correlation to predict the packet reception of neighboring nodes. Duplicated 
transmissions happen when a sender does not realize that neighboring receivers have already received 
the packet, so the sender retransmits the packet. Using spatial link correlation information, a sensor 
node can more accurately predict whether its neighbors have successfully received the packet, 
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Figure 18.4 The performance of outdoor linear network experiment. (a) Reliability, (b) message 
overhead, (c) dissemination delay, and (d) load balance. 


leading to fewer duplicated transmissions, lower congestion, shorter delays, and lower energy 
consumption. 


18.3 Energy-Efficient Opportunistic Routing Protocol Design 


Wireless networks is an active research area and many wireless routing protocols and scheduling 
policies have been proposed ([62—73]). Most of these routing protocols preselect a minimum cost 
single path or multiple alternate paths and use the preselected static path(s) to forward data packets. 
At any specific time, these routing protocols use unicast to forward data packets inside a single path 
even though these protocols discovered multiple alternate paths during route discovery process. 
On the other hand, broadcasting protocols have already been extensively investigated ([44, 
74-781). The literature in broadcasting protocol designs can be classified into two categories: 
deterministic approaches and probabilistic approaches. In the deterministic approaches, a fixed 
node within a connected dominating set is determined as a forwarding node. These approaches 
are also called fixed-forwarder approaches. In these approaches, the connected dominating set is 
calculated by using global or local information. In a probabilistic approach, when a node receives 
a packet, it forwards the packet with probability p. The value of p is determined by relevant 
information gathered at each node. Simple probabilistic approaches predefine a single probability 
for every node to rebroadcast the received packet. When running these protocols in a network with 
different node densities, the nodes in a dense area may receive a lot of redundant transmissions. 
More complicated and efficient protocols, such as distance-based and location-based [75] schemes, 
use either area or precise position information to reduce the number of redundant transmissions. 
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Despite this rich literature, the existing broadcasting approaches try to disseminate packets from 
one node to all the other nodes inside the network by using broadcast. ExOR [79] is an influential 
opportunistic routing protocol for wireless mesh networks. ExOR preselects prioritized forwarding 
candidates and each packet carries those forwarding candidates list. Only receivers in the forwarding 
list forward packets in the order of forwarding priority estimated based on the proximity (measured 
using ETX [80]) to the destination. 

Zhong et al. [81] have proposed the expected any-path transmissions system. Westphal has 
proposed opportunistic routing in dynamic ad hoc networks (OPRAH protocol) [82]. OPRAH 
prepares multiple paths for a destination and each packet carries forwarding node list as ExOR 
did. These opportunistic routing protocols either preselect forwarder list or prepare multiple paths 
and use them to forward data packets. Furthermore, these protocols mainly focus on data plane 
improvement with underlying link state routing protocol or AODV-style control plane protocols. 

Unlike the previous approaches, the energy-efficient routing (E 2R) protocol [83] neither pre- 
selects any single or multiple alternate paths nor maintains next hop information. Instead, E? R 
delivers data packets using broadcast and simultaneously utilizes all the neighboring nodes to for- 
ward data packets. Moreover, E?R simultaneously focuses on the reduction of control overhead 
and the enhancement of data delivery ratio. 

The key idea of E?R is to exploit spatial diversity (i.e., broadcast nature) in wireless networks 
rather than specifying data delivery paths (i.e., next hops). In E?R, the source node does not specify 
any particular paths. Both route metric discovery (RMD) packets and data packets are delivered 
through broadcast. Nodes that have better opportunities to deliver the packets are automatically 
selected to forward the control packets and data packets. Similar to other wireless routing protocols, 
E? R operates in two phases: RMD and data delivery. In this section, we first briefly introduce these 
two phases and discuss the design challenges in these two phases. The detailed design is described 
in Section 18.3.3. 


18.3.1 Route Metric Discovery Phase 


In this phase, RMD packets are delivered via broadcast. There are two challenges that have to be 
carefully addressed. The first challenge concerns how to prevent the repeated flooding of RMD 
packets when the network is first constructed. At the early stage of network construction, RMD 
packets must be delivered to every node inside the network since the source node and intermediate 
nodes do not know the direction of the destination node. Controlled flooding schemes (e.g., each 
node always forwards the newly received RMD packet once or multiple times) can be applied. 
However, flooding schemes can cause many unnecessary rebroadcasts of RMD packets especially 
when the network density is high. To address this challenge, greedy forwarding algorithm is 
devised for the distribution of RMD packets. The current forwarding node’s covered neighbor 
list is embedded in the control packets. Here, we say a node is covered if it has already received 
the packet. Whenever a node receives the packet it marks whether its neighbors have already been 
covered based on the covered neighbor list in the packet. Then the node sets a waiting (i.e., back-off) 
time based on the number of its neighbors that have not been covered. Intuitively, the node with 
more uncovered neighbors should have higher priority to forward the packet. Therefore, if a node 
has a large number of uncovered neighbors, its waiting time should be shorter. 

The second challenge concerns how to reduce redundant transmissions of the route metric 
reply (RMREP) packet. At the stage after the destination node receives the RMD packets, the 
destination node tries to deliver the RMREP packet back to the source node. Similarly, controlled 
flooding schemes will introduce redundant transmissions. This challenge differs from the first one, 
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because the destination node and the intermediate nodes already know the direction of the source 
node because they have received the RMD packet initiated from the source node. Therefore, the 
challenge is to utilize this knowledge to further reduce redundant transmissions. To address this 
challenge, researchers introduce an efficient self-suppression scheme (described in Section 18.3.3), 
which suppresses the forwarding of the RMREP packet based on the route metric. 


18.3.2 Data Delivery Phase 


After the RMD phase, the source node obtains a route metric for the new destination. The source 
node now needs to deliver data packets to the destination node. The challenge is how to utilize 
spatial diversity to improve end-to-end performance (i.e., end-to-end packet delivery ratio and 
delay) and reduce energy consumption (by reducing the total number of packet transmissions 
inside the networks). 

In order to address this design challenge, researchers introduce a forwarder self-selection scheme, 
which is similar to the relay selection scheme used in [84]. The source node attaches the obtained 
route metric to data packets and broadcasts the data packets without designating forwarding nodes. 
Upon receiving the data packets, the nodes that have smaller route metric value than the attached 
route metric value are eligible to further forward the data packets. Before these nodes forward the 
received data packet, they wait for a small amount of time to do backoff based on their own route 
metric values for the destination. For example, the node with smaller route metric value will have 
shorter backoff time and select itself to forward the received data packets. During the backoff time 
interval, these nodes listen to the channel and suppress the forwarding of received data packets 
if they overhear data packets forwarded by a node with a smaller route metric value. When the 
backoff timer fires, the node updates the route metric value in the data packets by attaching its own 
route metric and forwards the data packets. 


18.3.3 Design of E?R 


This section describes the detailed design of E?R that contains maintenance state, RMD, and 
data delivery. 


18.3.3.1 Maintenance State 


A node enters a maintenance state after it is deployed. While in the maintenance state, every 
node maintains its neighboring node information and the route metric from itself to all the other 
nodes inside the network. Like other wireless routing protocols, every node inside the network 
periodically sends out HELLO messages to indicate the existence of the node. Moreover, every 
node uses the HELLO messages received from its neighboring nodes to update its neighboring node 
set (N (2). 

Besides neighboring node information, every node also maintains the route metric from itself to 
all other nodes inside the network. E*R is compatible with all other route metrics (e.g., ETX [80] 
or ETT [85]) that have been proposed. For example, we can give nodes with smaller ETX values 
higher priority to forward received packets. Without loss of generality, in this chapter we use 
distance vector (e.g., hop count) as the route metric. If a node s needs to route data packets to a 
destination node and there is no route metric maintained at node s for that node, s will initiate the 
RMD process. To reduce the transmission of control messages, the establishment and maintenance 
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of route metrics is on demand. It is triggered when the source cannot deliver data packets to 
the destination. 


18.3.3.2 Route Metric Discovery 


The RMD process includes two stages: the stage of disseminating RMD packets and the stage of 
propagating RMREP packets. 


18.3.3.2.1 RMD Packets Dissemination 


During RMD discovery, the source node originates a new RMD packet if the source node needs to 
route data packets to a destination node and no route metric is available to the destination node. 
The RMD packet contains the source node id (s), packet id (Pid ), source node’s covered neighbor 
list (CN), destination node id (4), route metric (R), and route metric sequence number from 
source to destination (S“). When a node, i, receives an RMD packet, ; processes the RMD packet 
based on the greedy forwarding algorithm (shown in Algorithm 18.1). In the first step, ¿updates its 
uncovered neighbor set (UN (4)) by using the source node’s covered neighbor list that is embedded 
in the RMD packet (Line 1). Here the uncovered neighbor set of i is the set of 's neighbors that 
have not received the RMD packet. 


Algorithm 18.1 Greedy forwarding algorithm 


1: Update UN (i) based on CN (s) 
2: if (¢ = s) and (i = 4) then 
3: // iis an intermediate node 
if new RMD then 
if SZ > SI then 


4 

5 

6 //i has a fresher route metric to destination 

7 i sends RMREP with S? 

8 else if all neighbors in N; received the RMD then 
9 drop RMD 

0 else 

1 // i has an out-dated route metric 

2: wait for Thackof period assigned based on UN (å) 
3 if a neighbor forwarded RMD and UN (i) = $ during Thacko then 
4 drop RMD 

5 else 

6 CN(s) — CN (i), R < R + 1, forward RMD 

7 end if 

8 end if 

9 end if 

0: else if i = d and new RMD then 

21:  // ¿is the destination node 

22: send RMREP 

23: end if 


N 
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If i is an intermediate node and this RMD packet is the one that i receives for the first time 
(Lines 2—4), there are three possible cases: 


m Case 1: i has a fresher route metric to the destination than the source has. In other words, 
the sequence number of the route metric from i to the destination (sé) is larger than the 
sequence number of the route metric from the source to the destination (52), In this case, 
i directly returns and RMREP packet with S7 (Lines 5-7). 

m Case 2: All neighbors of 7 have received the RMD packet (i.e., UN (2) = $). In this case, 
there is no need for i to rebroadcast the RMD packet. Therefore, i drops the RMD packet 
(Lines 8 and 9). 

m Case 3: ; does not have a fresher route metric to the destination than the source and some 
neighbors of i are uncovered. In this case, sets the backoff timer with time interval Tpackop. 
Here the value of Tpackoff is inversely proportional to the size of the uncovered neighbor 
set UN (i). The larger the uncovered neighbor set UN (i), the smaller the value of Toaca. 
Therefore, E?R protocol allows the node (assume node j) that has a larger number of 
uncovered neighbors to rebroadcast the RMD packet first. When 7 overhears j's rebroadcast 
during 7’s backoff time interval, î updates its uncovered neighbor set UN (i). If all the 
neighbors of i are covered, then i drops the RMD packet (Lines 10-14). Otherwise, i will 
update the covered neighbor set CN (s) with 2's covered neighbor set CN (3) and increase 
the value of route metric R (if the route metric is hop count, then the number of hop 
count is increased by 1) in the RMD packet and rebroadcast the RMD packet (Lines 15 
and 16). 


If i is the destination node and this RMD packet is the one that 7 receives for the first time, 
i will send back an RMREP packet (Lines 20-23). 


18.3.3.2.2 RMREP Propagation 


As discussed in Section 18.3.3.2.1, RMREP packets are generated either by the destination or by 
the intermediate node, which has a fresher route metric to the destination than the source node. 
The RMREP contains the destination node id (4), source node id (s), packet id (Pid), route 
metric (R), and route metric sequence number from source to destination (S2). Here the route 
metric R is the number of hops from source to destination. Since the RMREP does not contain any 
intermediate node information, it will be unnecessarily propagated to all nodes inside the network, 
which will result in a large amount of energy waste. In order to address this issue, ER protocol 
introduces an efficient self-suppression scheme which contains two rules: 


m Rule 1: If the node has rebroadcasted the source originated RMD packet, this node is 
eligible to forward the RMREP packet. No other node is eligible to forward the RMREP 
packet. This rule avoids unnecessary rebroadcasts from nodes far from both the source and 
the destination. 

m Rule 2: If the route metric of the node to the destination is larger than the route metric in 
the RMREP packet, this node is not eligible to forward the RMREP packet, for example, if 
we use hop count as route metric. If the node has a larger number of hops to the destination 
than the source has, the source will not use this node to forward the data packet. Therefore, 
there is no need to let this node forward the RMREP packet. 
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18.3.3.3 Data Delivery 


After the source receives the RMREP packet and obtains the route metric, the source needs to 
forward the data packets to the destination node. Unlike other wireless routing protocols (such 
as AODV) and forwarding methods (such as ExOR), in £?R, the source node does not need to 
designate its next hop or a forwarder list within the data packets. The source node only attaches 
the obtained route metric to data packets and then broadcasts the data packets. In this scheme, 
the forwarders of the data packets are selected by the intermediate nodes themselves. It is called a 
forwarder self-selection scheme. 

When an intermediate node 7 receives the data packets, i compares its own route metric value 
with the value of the route metric embedded in the data packets. If 7s route metric value is smaller 
than the value of the route metric embedded in the data packets, i selects itself to be a potential 
forwarder of the data packets. However, i does not know whether its neighbors also received the data 
packets and have smaller route metric values than 7. In order to handle this problem, we introduce 
a backoff mechanism and design the backoff time interval based on the route metric values. The 
smaller the value of the route metric a node has, the shorter the backoff time this node experiences. 
During the backoff time interval, listens to the channel and suppresses the forwarding of the data 
packets if 7 overhears that one of its neighbors with a better route metric already forwarded the data 
packets. If 7 does not overhear its neighbors’ forwarding and its backoff timer fires, 7 updates the 
route metric in the data packets with its own route metric and then rebroadcasts the data packets. 

When the destination node receives the data packets, the destination node returns an ACK to 
the source node. The propagation of the ACK is similar to the propagation of the RMREP packet. 

Researchers have performed extensive simulation with various network configurations to reveal 
the performance of E*R. The results show that the E*R protocol can provide high packet delivery 
ratio, low control overhead, and low packet delivery delay in unreliable environments. Moreover, by 
reducing the number of packet transmissions, E*R protocol can effectively reduce the energy con- 
sumption and make it an energy-efficient routing protocol for multi-hop green wireless networks. 
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19.1 Agents’ Negotiation for Wireless Sensor Networks 


In multiagent systems (MASs) [1,2], several contributions in the literature are addressing the 
problem of sharing resources between two or more agents within a predefined space, for example [3]. 
For two agents to efficiently share resources, a mutually beneficial agreement should be reached so 
that each is achieving part or all of the assigned goals, (e.g., monitor/report temperature). Several 
constraints are considered by each agent while attempting to achieve the assigned goals (e.g., time 
vs. number of goals). 

In order for several agents/sensors to discuss the possibility of reaching a certain agreement, a 
common negotiation protocol must be applied, similar to the one explained in [4]. Then each 
agent adopts a specific negotiation strategy, such as this in [5], before engaging in a potentially 
beneficial interaction. The focus of these strategies vary depending on the purpose this agent or 
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robot was designed for. For example, some negotiation strategies can be time or resources driven 
while others can be score-oriented. 

In this section, we give an overview ofagents” negotiation approaches from three different contexts. 
These contexts are (1) common settings, (2) service provision/acquisition, and (3) wireless networks. 


19.1.1 Agents’ Negotiation in Common Settings 


In [6] agents, negotiation protocols are addressed with respect to three abstract domains: (1) 
task-oriented domains, where an agent’s activity is a set of tasks to be achieved; (2) state-oriented 
domains, where an agent is moving from an initial state to a set of goal states; and (3) worth-oriented 
domains, where agents evaluate each potential state to identify its level of desirability. 

In [5], a particular focus was given to agents interacting in distributed information retrieval 
systems arguing that the cooperation of information servers relatively increases with the advances 
made on agents’ negotiation. Two scenarios of agent’s negotiation are considered: (1) negotiation 
about data allocation, where autonomous agents/servers are sharing documents and they need to 
decide how best they could locate them and (2) negotiation about resource allocation, where the 
main focus is given to domains of limited resources as well as those of unlimited ones, and agents 
are bilaterally negotiating to share expensive or common resources. 

Based on Rubinstein’s model for alternating offers [7], the negotiation protocol presented in 
[5] is straightforward. One agent makes an offer to another that has to choose between accepting, 
rejecting, or opting out of the negotiation process. Each agent has its own utility function that 
evaluates all possible negotiation results and a strategy to decide what actions to perform at every 
expected situation. Although the negotiation of agents about the allocation of limited resources is 
similar to our scenario wherein sensors are interacting to determine the use of limited resources, 
still in our case sensors are expected to be more than just two, and sensors are not expected to 
generate offers and wait for responses. 

The Contract-Net protocol presented in [4] is a high-level negotiation protocol for communi- 
cating service requests among distributed agents. R. G. Smith considers the high-level negotiation 
protocols as methods that lead system designers to decide “what agents should say to each other.” 
And low-level protocols make system designers decide “how agents should talk to each other.” The 
Contract-Net protocol assumes the simultaneous operation of both agents asking to execute tasks 
and agents ready to handle it. The asking agents broadcast a call for proposals, and the helping 
agents submit their offers and then one is granted the pending task, or the session is closed. Three 
points are worth highlighting in the earlier approach: 


1. Linking between high-level and low-level negotiation protocols is essential when it comes to 
agents interacting in limited and variable resources environment. For example, when users 
of pocket computing devices delegate software agents to exchange and accomplish service 
requests on the go, the efficiency of the negotiation protocol that agents will employ is 
relatively increasing with the size of bandwidth a network utilizes, and the time it takes to 
transfer agent’s requests/messages from one location to another. 

2. A central decision-making situation may easily occur when a service seeker initiates the call 
for proposals, and it receives back all of the prospects offers, and the same agent is the only 
one who decides upon the termination of the negotiation process. 

3. In the Contract-Net, it is always assumed that two different types of agents are interacting 
(e.g., buyer and seller agents), which is different in the sensors scenario because they are all 
of the same type. 
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19.1.2 Agents” Negotiation for Service Acquisition 


In [8], a service-oriented negotiation model for autonomous agents was presented. Following the 
traditional client/server approach, scholars assumed that an autonomous agent is a “client” to 
another serving agent “server” that is in turn delegated to achieve a certain goal, which is acquiring 
or selling a service. Authors have focused their research on the reasoning model an agent will 
employ to identify its prospective servers, deciding about whether to perform parallel or sequential 
negotiations, making or accepting an offer, and abandoning a process. 

Accordingly, the times to reach and execute an agreement were both considered while identifying 
their negotiation model. Based on a model for bilateral negotiation that was presented in [9], 
authors of the service-oriented negotiation model proposed a multilateral variation of it to satisfy 
the application domain they are interested in. However, the requirements they attempted to satisfy 
(i.e., privacy of information, privacy of models, value restrictions, time restrictions, and resources 
restrictions) are related to our research focus but of ordinary network communications. 

In [10], an agent-based architecture for service discovery and negotiation was presented. The 
realization of three novel requirements has motivated the work of these authors; these requirements 
are as follows: (1) interactions of software agents are not necessarily happening in one network 
only and, these interactions may involve more than two types of agents (e.g., service provision 
agent, service acquisition agent and service evaluator agent); (2) diverse connection technologies 
can be utilized at different costs, which increase the complexity of a system and enable a higher 
level of end-user dynamicity; and (3) a service application should automatically react to changes 
as long as it is of end-user benefits. The scenario they used to motivate their work involves three 
different agents. The user agent is located on the portable device of the user, and it contacts a 
marketplace agent that is installed at wireless hotspot location and is responsible for maintaining 
a list of available Internet services providers (ISPs) that each has a representing agent called 
ISP agent. 

An agreement is reached when the user agent succeeds to make a contract with one of the ISP 
agents and retrieves a configuration file that, eventually, the end user installs on its pocket device 
to get an Internet access through the best available ISP. The sequence of interactions among the 
involved agents was described, and negotiation protocols and strategies were the traditional FIPA 
Contract-Net [11] and FIPA-English Auction Protocol [12]. 

It is also worth highlighting here that four different types of auctions are widely considered in the 
literature of agents: (1) English, (2) Dutch, (3) first price sealed-bid, and (4) second-price sealed-bid 
(e.g., in [13]). These auction types share the same goal, which is granting a single item (sometimes 
combinatorial) to a single agent (sometimes a coalition) in a limited resources environment. An 
agent may participate in an auction, so one of the carried “personal” tasks can be accomplished 
or—like in cooperative systems—a learning behavior can be implemented, so agents are able to 
predict the future importance of this item to another agent, which is known as common value. 

A multiagent negotiation method for on-demand service composition is presented in [14]. 
Agents here are expected to negotiate in order to reach agreements about combining different 
services from different providers to finally meet a consumer's expectations. The negotiation process 
is functioning by means of messages exchanging. When an invocation for service acquisition occurs, 
it is assumed that all available agents are representing specific services in a network, and they receive 
a message that contains a set of requirements to fulfill. If a single agent is capable of providing 
this service on its own, it broadcasts an OK message, if not, it transmits a Nogood message to the 
others asking for help. Other agents receive this help request and review their capabilities and give 
a response, and so on. 
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Using messages exchange to negotiate the acquisition of a service between entities located 
in a fixed wireline network is likely to be satisfactory. However, in a sensors network, where 
communication resources are expensive and limited, this approach will add a considerable amount 
of traffic, and it will increase the time a service application takes to act in response to requests. 


19.1.3 Agents” Negotiation across Wireless Communications 


An architecture for pervasive negotiation that uses multiagent and context-aware technologies is 
proposed in [15]. The main focus of this research effort is to consider specific profiles, preferences, 
and locations so that personalized services are transmitted to other remote devices. Authors of 
this chapter have considered three different agents that are involved in any negotiation scenario, 
these agents are (1) user agents that announce the preferences of the assigned goal, (2) supplier 
agents that take into consideration the preferences and the context provided to compete for service 
provision, and (3) negotiation agents that maintain all of the allowed negotiation strategies and 
mediate between the earlier two types of agents. 

The negotiation mechanism proposed is influenced by semantic web approaches. The negotia- 
tion agent (i.e., the mediating agent) is coming up with the best-fitting strategy in particular situation 
by parsing throughout predefined negotiation ontology. This ontology is flexible and interactive so 
that end users and service providers are able to present their own negotiation conditions. 

Intelligent information agents [16] are those capable of interacting with several distributed or het- 
erogeneous information systems representing specific preferences in obtaining data and overcoming 
information overload. In [17], authors have relied on information agents, data acquisition agents, 
and rule-based reasoning agents to build an MAS capable of receiving data from a legacy information 
system—enterprise resource planning—and control the extracted information using artificial intel- 
ligence techniques. That effort has added the possibility of an existing information system to be 
customized according to the new preferences of end users without any reengineering processes. 

Using ubiquitous agents, another approach was taken in [18] to allow mobile devices to access 
Web information systems depending on their location. Authors have used agents to represent the 
goals nomadic users would like to achieve, store the exact location and connection features of each 
user, and then migrate these agents to different information systems (or other mobile devices) to 
find relevant data or another information agent capable of answering user’s requests. In PUMAS, 
the agents negotiation was implemented using standard distributed systems technique—message 
passing and recommendations—in spite of the dynamicity of mobile users and the limited resources 
of a mobile network. 

Coalition formation is another approach to agents’ negotiation. In [19], a model for coalition 
formation was proposed to enable each agent to select individually its allies. The model supports 
the formation of any coalition structure, and it does not require any extra communication or 
central coordination entity. Similarly, the definition of an optimal coalition in [20] is based on 
Pareto dominance and distance weighting algorithm. In addition, in [21], a trusted kernel-based 
mechanism for forming a coalition is presented, rather than on efficiency of acommon task solving. 

Coming from a WSN background authors of [22] are proposing an approach for developing 
autonomous agents that can employ economic behavior while engaged in a negotiation process. 
Applying this behavior let agents become able to independently take rational economic-driven deci- 
sions. Same authors described two related scenarios wherein agents are negotiating the possibility 
of acquiring a service. These scenarios are involving: (1) the negotiation of the transportation and 
communication services with the available ad-hoc network, and (2) granting the usage of the WSN 
to successfully agreed agents. 
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From electronics, another approach to automated agents’ negotiation and decision making in 
resources-limited environment was presented in [23]. Scholars here perceive an MAS as a set of 
distributed radio antennas of a cellular network, and by means of agents’ contracting, these agents 
would share the use of resources (e.g., bandwidth). 

In [24], authors argue that by using reinforcement learning an agent will then be capable 
of employing a set of strategies that respond to the opponents’ with various concessions and 
preferences. The protocol authors propose let agents respond to a negotiator by either giving 
a deadline for an agreement to be reached, or a complete rejection. The main focus of the 
two experiments they explained shows how an agent learns a behavior while still keeping its 
functionalities at the same level of performance, which they call it “not breaking down.” 


19.2 Agent Factory Micro Edition for Wireless Sensor 
Network Motes 


Agent factory micro edition (AFME) [25] is an intelligent agent framework for resource-constrained 
devices and is based on a declarative agent programming language, which, in a similar vein to other 
intelligent agent platforms, is used in conjunction with imperative components. These imperative 
components imbue agents with mechanisms that enable them to interact with their environment; 
agents perceive and act upon the environment through perceptors and actuators, respectively. The 
word perceptor is used rather than sensor to distinguish the software component from hardware 
sensors. 

Perceptors and actuators represent the interface between the agent and the environment and are 
implemented in Java. This interface acts as an enabling mechanism through which the agents are 
situated. AFME incorporates a variant, specifically designed for resource-constrained devices, of 
the agent factory agent programming language (AFAPL) [26]. AFAPL is a declarative language and 
is a derivative of Shoham’s agent-oriented programming framework [27]. It is based on a logical 
formalism of belief and commitment and forms part of the agent factory framework [26], which 
is an open source collection of tools, platforms, and languages that support the development and 
deployment of MASs . In its latest incarnation, the agent factory framework has been restructured 
to facilitate the deployment of applications that employ a diverse range of agent architectures. As 
such, the framework has become an enabling middleware layer that can be extended and adapted 
for different application domains. The framework is broadly split into two parts: 


m Support for deploying agents on laptops, desktops, and servers 
m Support for deploying agents on constrained devices such as mobile phones and WSN nodes 


AFME represents the latter, whereas the former is delivered through agent factory standard 
edition (AFSE). In the remainder of this chapter, we shall only consider AFME. An overview of 
the AFME control process is provided in Figure 19.1. In AFME, rules that define the conditions 
under which commitments are adopted are used to encode an agent’s behavior. The following is 
an example of an AFME rule: 


message (request, ?sender, removeData (?user) )>deleteRecord(?user) ; 


The truth of a belief sentence (text prior to the > symbol) is evaluated based on the current 
beliefs of the agent. The result of this query process is either failure, in which case the belief sentence 
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Figure 19.1 AFME control process. 


is evaluated to false or to a set of bindings that cause the belief sentence to be evaluated to true. In 
AFAPL, the ? symbol represents a variable. In this example, if the agent has adopted a belief that it 
has received a message from another agent to remove user data, it adopts a commitment to delete 
the record related to the user. At an imperative level, a preceptor, which is written in Java, monitors 
the message transport service, which contains a server thread that receives incoming messages. Once 
a message is received, it is added to a buffer in the service. Subsequently, the perceptor adds a belief 
to the agents belief set. The interpreter periodically evaluates the belief set. If the conditions for a 
commitment are satisfied (i.e., all of the beliefs prior to the > symbol in the rule have been adopted), 
either a plan is executed to achieve the commitment or a primitive action or actuator is fired. In 
this chapter, we shall only consider primitive actions. When an actuator is created, it is associated 
with a symbolic trigger. In this case, a delete record actuator, written in Java, is associated with the 
trigger string deleteRecord(?user). Once the commitment is activated, the ?user variable is passed 
to the actuator and the imperative code for deleting the file is executed. Structuring agents in this 
manner is useful in that it enables their behavior to be altered at a symbolic level rather than having 
to modify the imperative code. 

In AFME, the commitment to the right of the implication (the > symbol) can take additional 
arguments. These arguments represent to whom the commitment is made, the time at which the 
commitment should be executed, the predicate for maintaining the commitment, and the utility 
values of the commitment. These additional arguments go beyond the scope of this discussion and 
shall not be described here.* 

In order to facilitate communication between AFME agents in WSN applications, a wireless 
message transport service has been developed that can be controlled and monitored through the 


* Fora discussion of how these arguments are supported in AFME along with other features of AFME/AFSE, such 
as dynamic role adoption/retraction and the AFME intention selection process, see [29]. 
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use of actuators and perceptors. The Sun SPOT motes [28] communicate using the IEEE 802.15.4 
standard. The wireless message transport service facilitates peer-to-peer communication between 
agents and is based on the Sun SPOT radiogram protocol rather than TCP/IP, which is used for 
agents deployed on mobile phones or PDAs that have a 3G or GPRS connection. The radiogram 
protocol uses datagrams to facilitate communication between motes. 

With the Sun SPOT radiogram protocol, the connections operating over a single hop have 
different semantics to those operating over multiple hops. This is due to a performance optimization. 
When datagrams are sent over more than one hop, there are no guarantees about delivery or 
ordering. In such cases, datagrams will sometimes be silently lost, be delivered more than once, and 
be delivered out of sequence. When datagrams are sent over a single hop, they will not be silently 
lost or delivered out of sequence, but sometimes they will be delivered more than once. 

The radiogram protocol operates in a client-server manner. When the message transport service 
is created, a server thread is created to receive incoming messages. When a message is received, 
it is added to an internal buffer within the service. An agent will subsequently perceive messages 
through the use of a perceptor. 

When an agent is sending a message, it attempts to open a datagram client connection. The 
datagram server connection must be open at the destination. With datagrams, a connection opened 
with a particular address can only be used to send to that address. The wireless message transport 
service only allows agents to send messages of a maximum size. If the content of the message is 
greater than the limit, it is first split into a number of sub-messages within an actuator and each 
sub-message is then sent using the message transport service. When all sub-messages have been 
received, the entire message is reconstructed within a perceptor and then added to the belief set of 
the agent. 

One of the core features of AFME is the support for agent migration. For the Sun SPOT 
platform, this support is delivered through the AFME wireless migration service. Agent migration 
is often classified as either strong or weak. This classification is related to the amount of information 
transferred when an agent moves. The more information transferred, the stronger the mobility. 
Within AFME, support is only provided for the transfer of the agent’s mental state. Any classes 
required by the agent must already be present at the destination. The Java platform AFME has been 
developed for, namely the Java Micro Edition (JME) Constrained Limited Device Configuration 
(CLDC). Therefore, it does not contain an API to dynamically load foreign classes. Only classes 
contained, and preverified, in the deployed jar can be dynamically loaded through the use of the 
Class.forName method. This is also one of the reasons why component deployment frameworks, 
such as OSGi [30], cannot be used for CLDC applications. 

In the Squawk JVM, which operates on Sun SPOTs, it is possible to migrate an application 
to another Squawk-enabled device. Squawk implements an isolate mechanism, which can be used 
for a type of code migration. Isolate migration is not used in AFME. The reason for this is that 
isolate migration is dependent on internal details of the JVM and is therefore not really platform 
independent in the sense that an isolate can only be transferred to another Squawk JVM. It could 
not be used to transfer an application to a C or C++ CLDC JVM written for a mobile phone 
JVMs, for instance. Additionally, with isolates, it is necessary to migrate the entire application or 
platform, rather than just a single agent. 

This AFME migration service uses both the Sun SPOT radiogram protocol and the radiostream 
protocol. The radiostream protocol operates in a similar manner to TCP/IP sockets. It provides 
reliable, buffered, stream-based communication between motes. This, however, comes at a cost in 
terms of power usage. The reason this approach is adopted for agent migration is that we wish to 
ensure that agent does not become corrupt or lost due to the migration process. If a message is lost 


434 m Intelligent Sensor Networks 


or corrupt, the system can recover by resending the message. If an agent is lost or corrupt, it cannot 
be recovered without duplication or redundancy, which would also use up resources and would 
become complex to manage as agent artifacts would be scattered throughout the network. 

The problem with the radiostream protocol, however, is that both the target platform and the 
source platform must know each other’s MAC address before a connection can be established. That 
is, it does not adopt a client—server approach or operate in a similar manner to the radiogram 
protocol. In a dynamic mobile agent setting, it is unlikely that the addresses of the platforms of 
all source agents will be known a priori at compile time. To get around this problem, when an 
agent wishes to migrate to a particular platform, initial communication is facilitated through the 
use of datagrams. Using datagrams, the platforms exchange address and port information and 
subsequently construct a radiostream. Once the radiostream is established, the agent is transferred 
through the reliable connection and then terminated at the source. Subsequently, the stream 
connection is closed. At the destination, the platform creates and starts the agent. 


19.3 Motivating Example 


In this section, we describe a health care-oriented example that is a potentially suitable testbed for 
future examination to the idea of integrating agent-based negotiation models with wireless sensor 
networks (WSNs) through the AFME. 

Considering the increasing number of older people around the world, it brings to our attention 
the health care monitoring systems deployed within smart houses, which reflect the world’s increas- 
ing emphasis on independent living [31]. In addition, one of the current priorities for UN activities 
targeting aging is to support active aging, where older people play a central role through continuous 
participation in social, economical, and cultural aspects of world society [32]. The key goal of the 
active aging framework is maintaining autonomous and independent living for older population in 
the home environment. And, we believe that the use of Intelligent WSNs that has approaches from 
Artificial Intelligence implemented within will better enhance the quality of life for aging people. 

The number of people over 65 with cardiovascular diseases is growing every year [33]. Recent 
activities of European Union are placing more industrial and academic emphasis on how to support 
and enhance the idea of independent living [34]. To enable a high-level patient’s mobility and 
free him or her from being connected to the expensive hospital equipment, a number of remote 
monitoring systems are introduced, for example [35]. The basic principle is to track the condition 
of the human cardiovascular system by means of a daily transmission of medically related events 
and vital signs. This approach reduces the in-clinic visits by up to 43% and results in patients 
leading a satisfying independent and safe life [36]. It does, however, require a large amount of 
information to be obtained from a limited number of measuring devices and, consequently, it also 
requires highly reliable sensors interactions. 

To increase the level of self-confidence, mobility, and autonomy of the older person, ambient 
assisted living (AAL) systems have been introduced, which made up of state-of-the-art sensors [37], 
robotics [38], and wireless communications that are all research areas that are increasingly becoming 
a main focus for the scientific society. The AAL’s primary task is to support daily activities through 
natural interactions and person’s behavior analysis. Recent research activities on AAL have resulted 
in the creation of complex frameworks such as the one presented in [39] that use system intelligence 
and WSN to build an awareness of the condition of the older person. 

The creative way of thinking and in-depth understanding of the scientific principles with 
targeting social and health aspects have helped the society improve the quality of life for the elderly 
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Figure 19.2 The implantable electronic cardiovascular device and smart environments. 


population. As a response to today’s situation on the assisted living and with the concept described 
in this example, the fields of wireless communications, cognitive networks, low-power systems, 
artificial intelligence, robotics, biomedical engineering, and health informatics are combined. 

The subject of this example deals directly with the concept of achieving seamless communications 
of biologically driven AAL with emphasis on health care. Todays AAL systems respond based 
on context obtained from embedded sources as the motion, humidity, water and gas sensors, or 
wearable sensors. As shown in Figure 19.2, the testbed we propose will be built upon an implantable 
electronic cardiovascular device (IECD) integrated in to an AAL system as a complementary source 
of the vital signs for increasing the responsiveness level for the active health care. This testbed 
is expected to be effective enough to guarantee fast and accurate responses to the needs of the 
concerned person. 

In this example, the IECD will bring significant extra information to bear on any analysis 
that might take place and on what responses may be forthcoming from the AAL. The vital signs 
received directly from the IECD can mean that steps are taken to prevent serious situations from 
developing. Furthermore, the focus on sustainable communication will ensure that the system will 
be capable of functioning irrespective of the activity of the user. The development of the seamless 
interactions between the core intelligent system and the biological active element is based on the 
wireless communications while respecting complexity and functionality. 

The goal of this example is to emphasize the importance of wireless communication devices and 
networks, artificial intelligence and ambient systems, cognitive modeling, computational organi- 
zational theory, health informatics, implantable devices and their integration within environment, 
the human and system behavior analysis with respect to health care support, safety and secu- 
rity, independence, and mobility. Their symbiotic combination will work toward highly efficient 
environment that is capable of the self-adjustment and learning as a response on the patientOs 
conditions (Figure 19.3). The methodology consists of a set of tools from infrastructural and 
behavioral analysis and modeling to implantable device testing within AAL context. 
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Figure 19.3 A general look on wireless dynamic heterogeneous environments. 


The proposed example involves an analysis of system configuration time, latency and accuracy, 
system responsiveness, sensors sensitivity, vital signs and their timeliness, communication frequency 
range, data rate and transmission channel, elements interactions and timing, level of the context 
awareness as a trade-off between scalability, complexity, and security. The objective is to create 
a reliable wireless communication system within the AAL context with the great impact on the 
system responsiveness for the health care support and daily assistance. 

The result of experimenting with testbed will have the potential to impact positively on the future 
development of wireless communications between the humans and its environment and serves as 
a reliable and fast link between them. A better understanding of the AAL context infrastructure 
and IECD mechanism is gained to provide the interconnections between them. At the same time, 
in-depth research and analysis of the system and human behaviors, their modeling and simulations 
are contributing elements of the project development to challenge the signal propagation inside the 
house. Medical data integration, environmental impact, and social acceptance studies will create a 
base of knowledge for the development of seamless communications in the AAL system. 

The correct analysis of the vital signs, interrogated from IECD, and studies on the human 
behavior with the focus on the particular group will aim on the mobility model’s development for 
robust communication link. Feasibility and safety will be in the focus of communication engineers 
to verify the efficiency of available information. 


19.4 Conclusions and Future Work 


In Section 19.1 we looked at the capabilities and interactions of WSNs by reflecting them through 
the literature of MASs that are responsible for allocating resources within complex and sometimes 
distributed environments. We discussed the idea of having sensors/agents negotiate to reach agree- 
ments and achieve goals. We then gave an overview of several approaches to agents’ negotiation 
from different contexts. In Section 19.2 we provided an overview of the AFME and discussed 
implementation details in relation to its deployment on WSNs motes, such as the Java-enabled 


Sun SPOT. 
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In Section 19.3 we described a health care—oriented example that is potentially a suitable 
testbed for future examination to our idea of integrating agent-based negotiation protocols 
with WSNs. 

In the near future, we will consider to extend our research on several related directions in order 
to tackle the negotiation aspect of AFME so that the efficiency of interactions between wireless 
sensors are enhanced. We will also be working on implementing a case-specific negotiation model 
for the Java-based Sun SPOT motes and examine the performance before and after the integration 
of the negotiation model. 
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Event detection allows for a wireless sensor network to reliably detect application-specific events 
based on data sampled from the environment. Events, in this context, may range from the 
comparatively easy to detect outbreak of a fire to the more complex stumbling and falling of a 
patient, or even to an intrusion into a restricted area. Distributed event detection leverages the 
data processing capability of the sensor nodes in order to locally extract semantically meaningful 
information from the sensed data. This process may involve one or multiple nodes in a given region 
of the deployment, and generally avoids data transmissions to the base station of the network. This 
in turn reduces the overall energy consumption and ultimately leads to a prolonged lifetime of the 
network. 

In this chapter, we motivate the need for in-network data processing and event detection, 
and briefly review the current state of the art. We then present in detail our own approach to 
distributed event detection and discuss hardware platform, software architecture, and algorithms 
and protocols used in the detection process. Our approach has been tested extensively in several 
real-world deployments and we give an overview of the results from these experiments. The results 
illustrate how distributed event detection can achieve both high accuracy and energy efficiency, 
and thus mark the advantages of this approach to data processing in sensor networks. 


20.1 Introduction 


Event detection is a special form of in-network data processing for wireless sensor networks (WSNs) 
that pushes the logic for application-level decision making deeply into the network. Raw data 
samples from the sensors are evaluated directly on the sensor nodes in order to extract information 
that is semantically meaningful for the given task of the WSN. For example, a WSN that is tasked 
with detecting a fire will process the temperature readings from a sensor and only send an alert 
to the base station of the network if the readings reliably exceed a threshold value [1]. More 
complex examples include scenarios such as vehicle tracking [2], area surveillance [3], undersea 
monitoring [4], or the classification of human motion sequences [5]. 

In all ofthese deployments, sampled data are processed and evaluated close to its source, thereby 
reducing communication with the base station, minimizing energy consumption, and extending 
the lifetime of the network. As illustrated in Figure 20.1, this is achieved by programming the sensor 
nodes in such a way that the occurrence of deployment-specific semantic events can be established 
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Figure 20.1 Centralized data processing vs. decentralized event detection. 
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directly on sensor nodes, relying only on the locally available data from the sensors. Depending 
on the application, a variety of sensors can be employed, including light and temperature sensors, 
accelerometers, microphones, or even cameras. The transformation of data from the physical 
sensory domain into the scenario-specific application domain stands in contrast to other techniques 
for in-network data processing, for example, compression and aggregation, which are agnostic to 
the application-level properties of the data. 

In this chapter, we give an overview of current approaches to event detection in WSNs and 
present our own exemplary work in more detail. Our research focuses on the distributed detection 
of events based on one- or multi-dimensional motion patterns. A prime example of a potential 
application is a WSN in which sensor nodes are equipped with acceleration sensors. These sensor 
nodes are then mounted onto a fence and the movement of the fence measured with the goal of 
deciding whether a security-relevant event, for example, an intrusion into—a restricted area, has 
occurred. The system is trainable to detect new types of events and given the distributed nature 
of the algorithms employed—resilient to failures of individual nodes. We have implemented and 
evaluated our approach through a series of real-world deployments on several construction sites 
and summarize our finding in this text. 

We begin our treatment of the subject matter with a review of the state of the art in Section 20.2. 
Afterward, we proceed with the detailed description of our platform for distributed event detection 
AVS-Extrem [6] and cover the hardware platform, the software stack, and the event detection 
algorithms and protocols in Section 20.3. In Section 20.4, we then present experimental results 
from several deployments of this system, and finally conclude in Section 20.5. 


20.2 State of the Art 


Since event detection is one of the key areas of application of sensor networks, there is a large 
body of work, and multiple deployments have already been undertaken. In this section, we give an 
overview of the most common system architectures and event detection algorithms. 


20.2.1 Architectural Approaches 


As a first step, we discuss several possible system and networking architectures and evaluation 
strategies for event detection WSNs and their advantages and disadvantages. The networking 
architecture has a high impact on the reliability and energy consumption of the sensor network 
and is highly dependent on the application and the complexity of the events that need to be 
detected. 


20.2.1.1 Local Evaluation 


Local evaluation describes a method according to which one sensor node gathers and processes data 
by itself and decides whether a certain event has occurred or not. The results are sent to a base 
station where an alert can be triggered or the user can be asked to confirm the findings of the sensors. 
One system using this method is proposed by Yang et al. [5] and is aimed at recognizing human 
motions. It is a body area network consisting of eight sensor nodes attached to a person’s body who 
may perform 1 out of 12 actions. Data are delivered from an accelerometer and a gyroscope and 
are processed on each node. If local data processing results in a recognized event, then data of the 
node are transmitted to the base station and processed once again. 
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20.2.1.2 Centralized Evaluation 


The centralized evaluation architecture is possibly the most widely used: All nodes in a WSN 
communicate exclusively with the central base station, which has much more computing power 
and energy resources than the sensors nodes. The communication can contain the sensors’ raw data 
or the signal processing results (i.e., compression or reduction), which takes place on each of the 
nodes. Individual sensor nodes have no knowledge about the semantics of the collected data. The 
complete data are gathered and interpreted at the base station. This method has several advantages, 
for example, the high detection accuracy that is possible with a complete centralized overview. The 
main disadvantage of this approach is that the network suffers from poor performance for large 
numbers of nodes. Also, the system may run into energy problems as the whole network needs to 
constantly communicate with the base station and energy-saving techniques cannot easily be applied. 

As mentioned earlier, several projects use this method to detect events in their WSNs. An 
exemplary implementation is presented by Wang et al. [7] who describe a habitat monitoring 
system that is capable of recognizing and localizing animals based on acoustics. They employ a 
cluster head with additional processing capabilities that may request compressed raw data from 
other nodes for centralized evaluation. Animals are identified by the maximum cross-correlation 
coefficient of an observed spectrogram with a reference spectrogram. Using reports from multiple 
sensor nodes, the animals are then localized in the field. 


20.2.1.3 Decentralized Evaluation 


The decentralized evaluation approach forms smaller clusters within the WSN. Each cluster has a 
special node that performs the task of a cluster head. This node has the role to classify the data 
and to communicate with the base station. One advantage of that architecture is that it is fault 
tolerant against the malfunction of single nodes or the loss ofone whole cluster. Ifone cluster fails, 
the other clusters remain functional. With regard to energy awareness, this architecture also has 
advantages because it is possible to put clusters that are not needed or not triggered by an event 
into an energy-efficient idle mode. 

An exemplary deployment has been conducted as part of the SensIT project. Duarte and 
Hu [2] evaluated several classification algorithms in a vehicle tracking deployment. After gath- 
ering the acoustic and seismic data, each sensor node classifies the events by using extracted 
features. The features are extracted from the frequency spectrum after performing a fast fourier 
transform. The evaluation comprises three classification algorithms: &-nearest neighbor, machine 
learning, and support vector machines. The classification result is sent to a fixed cluster head for 
evaluation and is combined with reports received from other nodes for tracking a vehicle. 

A decentralized event detection system is also proposed by Martincic and Schwiebert [8]. Sensor 
nodes are grouped into cells based on their location. All nodes in a cell transmit their data samples 
to a cluster head that averages the results and retrieves the averages from adjacent cells. Event 
detection is performed on the cluster heads by arranging the collected averages in the form of a 
matrix and comparing it to a second predefined matrix that describes the event. An event is detected 
if the two matrices match. 


20.2.1.4 Distributed Evaluation 


The distributed evaluation method differs from the decentralized evaluation in two important 
points. First, all event detection and data processing take place in the network and no data except 
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for the signaling of an event are sent to the base station. Second, there are no designated cluster head 
nodes in the network. Instead, all nodes are able to perform the data processing task by themselves. 
In contrast to the previously discussed approaches, a cluster head is not needed since a leader node 
is chosen dynamically every time an event occurs. The result of a detection is distributed to all 
nodes which also detected the event and compared with their own results. The choice of which 
node takes the leading role is chosen by an application-dependent algorithm and may, for instance, 
result in a node that is physically close to the event or has an otherwise advantageous position 
within the network. This leader transmits the results of the distributed evaluation to the base 
station. 

The system presented in Section 20.3 of this chapter also falls into this category. Moreover, 
Li and Liu [9] propose an event detection system for coal mine surveillance to localize collapse 
events. The Structure-Aware Self-Adaptive principle of the 3D environment surveillance is used 
to detect falling nodes or changing positions of sensor nodes by using acceleration data, Received 
Signal Strength Indicator (RSSI) evaluations, neighbor loss, and some acoustic analysis. All nodes 
have to be set up with an initial known position in the mine. A beacon-based communication is 
periodically initiated to set up the neighborhood topology of each node. In the case of an event, 
the measured data are mapped with a random hash-function and transformed into a data signature 
file that is transmitted to the base station. 


20.2.2 Algorithmic Approaches 


We now shift our focus from the architectural and networking aspects to the data processing aspects 
of the event detection. Several methods to detect an event and to distinguish between noise have 
been proposed in the literature and are deployed in real-world installations of WSNs. The overall 
trade-off in this area is—as we will explore in this section—the weighting of algorithmic simplicity 
against the complexity of the events that the system is able to detect. Another interesting aspect to 
consider is whether a system is capable of learning events on its own, or whether expert knowledge 
needs to be embedded into the detection algorithm. 


20.2.2.1 Threshold Usage 


Current approaches to integrate event detection in WSNs often apply a threshold detection, either 
as a trigger for a more intensive gathering of data or as an event itself. Threshold values are suitable 
for a lot of applications, for example, fire detection, detection of flooding, or generally other 
applications in which a sensor can detect a critical boundary of the measured value. Although 
this method is very efficient and robust for simple detection problems, it is unable to detect more 
complex events such as movement patterns, faces, or speech. In a simple scenario, a possible 
drawback of the system is that every node that detects an exceeding threshold value will start to 
communicate with the base station. This can lead to an early energy shortage in the network if 
events occur often, or to network congestion if events occur simultaneously at the sensor nodes. 
These problems can be avoided by resorting to sophisticated networking architectures, as discussed 
previously. 

A typical application for a threshold-based event system is a system to detect fire, like the one 
Doolin and Sitar described in [1]. Temperature and humidity sensors are fitted to sensor nodes and 
deployed in an area that is to be monitored. When a fire occurs, the sensors measure temperatures 
and humidity values that do not occur under normal environmental conditions. The nodes can 
then assume that a critical event has occurred and can alert the base station. 
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20.2.2.2 Pattern Recognition 


Pattern recognition algorithms are able to process a wide variety of inputs and map them to a 
known set of event classes. As WSNs deal with a lot of data from different sources that are used to 
measure complex incidents, it is evident that pattern recognition algorithms can also be found in 
this domain. Pattern recognition on signals is commonly implemented by subdividing the process 
into several steps that can be executed by different entities of the system. During the sampling 
process, raw data are gathered, optionally preprocessed, and handed over to the segmentation process 
that detects the start and the end of the samples that belong to an event. The segmented data are 
then handed over to a feature extraction algorithm that extracts highly descriptive features from 
the data and creates a feature vector for the data segment. Features can be all kind of descriptive 
attributes of the data such as a histogram, a spectrogram, the minimum, the maximum, and so 
forth. The final step is the classification as part of which the feature vector is statistically analyzed 
or compared to either previously trained or fixed features like threshold values. In most cases, a 
prior training is necessary to deliver a sufficient set of training data that initializes the classifier. An 
in-depth introduction to this subject is available in Duda et al. [10] and Niemann [11]. 

The system presented in Section 20.3 also falls into this category. Additionally and as pre- 
viously mentioned, Duarte and Hu [2] use pattern recognition algorithms in a vehicle tracking 
deployment. The system performs the first three steps, that is, sampling, segmentation, and feature 
extraction. The features in use are all based on frequency analysis of the signal. Afterward, each 
node classifies the event and sends the result to a fixed cluster head for evaluation and combination 
with reports received from other nodes. 


20.2.2.3 Anomaly Detection 


Anomaly detection is a term used for approaches that focus on the specific case of detecting whether 
a particularly unusual event has occurred. This is achieved by learning typical system behavior over 
time and then classifying specific events as either normal or anomalous. Approaches with this goal 
expand upon the principles of the two previously described approaches and incorporate techniques 
from the field of intrusion detection and even bio-inspired artificial immune systems. 

For example, Walchli et al. [12] designed a distributed event localization and tracking algorithm 
(DELTA) that is based on a small short-term memory to temporarily buffer normal state. The 
algorithm was deployed as part of a light sensor office surveillance with the goal of detecting and 
tracking persons carrying flashlights. DELTA provides the leader node with the information needed 
to localize and classify an event based on a simplex downhill algorithm. 

A summary of all the approaches, advantages and disadvantages, and exemplary implementations 
is given in Table 20.1. 


20.3 Exemplary Platform: AVS-Extrem 


Based on the general discussion of architectures and algorithms, we now present an exemplary 
platform for distributed event detection, called AVS-Extrem: To detect an intrusion into a protected 
area, wireless sensor nodes equipped with accelerometers are integrated into a fence surrounding 
the area. The sensor nodes distinguish collaboratively between numerous event classes like opening 
the fence or a person climbing over the fence. In order to acquire a prototype for each event 
class, the sensor nodes are trained in a real-world construction site scenario. A prototype is an 
event class abstraction incorporating significant features that are extracted from the training data. 
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Table 20.1 Overview of Proposed Architectures and Detection Algorithms 


Exemplary 
Architecture Advantages Disadvantages Implementations 
Local Easy to implement and Only works well for Yang et al. [5] 
detection energy efficient. Low simple events, i.e., those 
hardware requirements detectable by 
monitoring data for 
exceeded thresholds or 
sudden alterations 
Centralized Full control of network Energy inefficient and Wang et al. [7] 
evaluation from one single point. traffic intensive, which 
Access to all raw data may lead to bottlenecks 


in data transmission 


Decentralized | Robust to failure of single | Special-purpose cluster Duarte and Hu [2] 


evaluation nodes due to clustering. | head nodes may be Martincic and 
Good energy efficiency needed. Deployment Schwiebert [8] 
influences network 
topology 
Distributed Data are exchanged Hard to implement since | Li and Liu [9] 
evaluation locally between nodes, application knowledge is | Wittenburg et al. [3] 
events are reported once | distributed across the 
to base station. Minimal network 


communication 
overhead and good 
energy efficiency 


Algorithm 
Threshold Easy to implement and Only applicable to Doolin and Sitar [1] 
usage very reliable certain (simple) 
application scenarios 
Pattern Ability to detect very Hard to implement. Duarte and Hu [2] 
recognition complex events Requires sufficient Wittenburg et al. [3] 
computational resources 
on nodes 
Anomaly Very robust because Cannot be applied to Wâălchli et al. [12] 
detection system adjusts itself to scenarios that lack 
minor changes in normal condition 


environment occurence 
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The features are characteristic attributes that describe one or more patterns observed in the raw 
data. After the training, the extracted set of features enables to describe a class precisely and to 
distinguish it from other classes. This way, each sensor node can assign a newly occurred event to 
the appropriate event class. 


20.3.1 Requirement Analysis 


Our proposed platform has to meet a set of scenario-independent and scenario-dependent require- 
ments as well as hardware/software requirements in order to achieve energy efficiency and to be 
applicable to a wide range of deployment scenarios. We develop our own peripheral component 
drivers that are integrated in our FireKernel OS. These components are application-related and 
intermesh the hardware and software requirements. Therefore, we, subdivide our discussion into 
the following two groups including details to the hardware and software. 

Scenario-independent requirements: On the hardware side, we migrate away from our initial 
MSP430-based prototype to an ARM-based MCU with additional RAM, in order to implement 
more complex detection algorithms and to realize an efficient hardware and software energy- 
awareness control. The MCU as master and the slaves, for example, the accelerator sensor, supports 
several power-saving modes. To store the application-dependent settings and the prototype data, 
we use a nonvolatile storage. 

The sensor nodes operate in a distributed and demand-based fashion, and in an environment 
with a dynamic topology. Hence, we employ a reactive routing protocol (cf. Section 20.3.4). 
Additionally, to reduce the transceiver power consumption, we implement an energy-aware and 
configurable Wake-On-Radio (WOR) duty-cycle. The sensor node is equipped with an energy- 
efficient acceleration sensor that is able to wake up the MCU when an acceleration threshold value 
is exceeded, otherwise the MCU remains in power-down mode (PD). Furthermore, we apply 
an OS-level priority-based task processing to enable an uninterrupted sampling and continuous 
communication. Finally, an adequately large and appropriate feature pool is gathered for the event 
detection by performing training intervals in order to enable an efficient feature set selection to 
build a prototype. 

Scenario-dependent requirements: From the hardware point of view, we choose an acceleration 
sensor based on previous experiences with the fence monitoring application. The sensor meets the 
following requirements: a range of at least 8 g to avoid overload during event sensing, a 10-bit resolu- 
tion with low noise level, and a sampling rate of up to 100 Hz to avoid aliasing. To cover a wide range 
of scenarios, we implement a self-calibrating routine for the acceleration sensor. The calibration 
routine takes care of the individual positional setup of the fence elements after every event. 

The battery case is shuttered with a screw cap that is actuated by a spring. In rare cases, a 
vertical shock is able to compress the spring that disconnects the battery. This problem is solved by 
an additional capacitor to bypass the resulting temporary power failure of up to 60 ms. Since we 
target rough environments, we package the sensor node within a housing that resists severe weather 
conditions and heavy shocks. 


20.3.2 Sensor Node Hardware 


The AVS-Extrem wireless sensor nodes, as depicted in Figure 20.2a, were developed to meet 
the demands of motion-centric applications and localization [6]. Each node is equipped with an 
ARM7-based NXP LPC2387 MCU operating at a maximum speed of 72 MHz with 96 KB RAM, 
and 512 KB ROM, and a CC1101 radio transceiver from Texas Instrument operating at the 
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Figure 20.2 AVS-Extrem platform. (a) sensor node and (b) system architecture. 


868 MHz ISM band. An LTC4150 coulomb counter is used to monitor the state of the batteries. 
For acquiring the movement data, a low-power and triaxial acceleration sensor is employed. We 
chose a Bosch SMB380 sensor with a range of up to 8 g and 10-bit resolution that adds only very 
little noise and has a configurable sampling rate of up to 1500 Hz. The sensor wakes up the system 
for further processing, once an acceleration above a user-defined threshold occurs. 

In addition to the core components explained previously, the PCB also carries a temperature 
and humidity sensor to monitor environmental parameters. Optionally, a Falcom FSA03 GPS 
sensor can be employed, as well as an SD Card slot to store individually trained motion patterns 
and current battery status in a nonvolatile storage. Adding new external peripherals to the platform 
is possible by attaching them to the provided connector accessing the SPI or GPIO interface. 

The sensor nodes are mounted in a very stable and fixed way utilizing a rugged housing within 
the fence that also contains the long-term battery supply [13]. The housing enables to conduct 
experiments in a highly repeatable manner with identical settings. 
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20.3.3 Software Stack 


As shown in Figure 20.2b, the system layer of our architecture contains the AVS-Extrem sensor 
board, the operating system, and the energy management. All components of the system layer are 
briefly introduced in the subsequent sections, while the details can be found in [13]. 

We employ FireKernel [14] as a low-power operating system to support our concepts and 
to evaluate concurrent algorithms on a sensor platform that features threading and a priority- 
based preemptive multitasking. A flexible energy management that supports WOR and numerous 
MCU-based energy saving techniques has also been implemented. For secured communications, 
we implemented the ParSec security layer that employs symmetric cipher algorithms in CBC-mode 
in conjunction with an initialization vector and a counter to provide confidentiality, authenticity, 
integrity, freshness, and semantic security for the wireless communication [15]. 

The application layer contains an optional Dempster—Shafer-based data quality estimator that 
is used to establish whether the reliability of a measurement is high or low. Finally, the system 
comprises the distributed event detection module, which is described in Section 20.3.5. 


20.3.4 Networking Protocols 


Multiple base nodes (BNs) are adopted in order to provide fault-tolerant connections between 
arbitrary nodes in the WSN and the base station of the sensor network. Figure 20.3 depicts a 
configuration of a network with two base nodes. Base nodes act as proxies between the WSN and 
the base station and also operate as a monitor of the network. The base station is connected to all 
base nodes via either a wireless or a wired connection. For monitoring and data collection purposes, 
the base station exchanges messages with the nodes. The messages contain information about the 
network state such as temperature or battery charge level ofa node. Furthermore, they report about 
occurring events in the surrounding area, such as shaking the fence or kicking against the fence. In 
case of failure or non-reachability of a base node, an alternative base node is selected. This way, the 
reliability of the network and the accessibility of the base station are improved. 

The implementation of the BN principle relies on the micro mesh routing (MMR) protocol 
[14] to benefit from a dynamic and fault-tolerant routing protocol covering changes in the network 
topology. MMR is a reactive routing protocol and combines aspects of different routing protocols 
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Figure 20.3 Data transmission over base nodes to base station. 
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Figure 20.4 Course of action of a micro mesh route discovery process. 


such as AODV and DSR. AODV-like functionality is used to take advantage of the hop-by- 
hop routing during the data transmission, while the principle of DSR enables to collect partial 
route information in the course of the discovery phase. In contrast to a strict reactive protocol, 
intermediate nodes analyze all forwarded packets to update their routing tables. Sequence numbers 
are used to prevent the formation of loops, to avoid the reception of duplicated messages, and to 
determine the freshness of a route. 

MMR defines three message types—route request (RREQ), route reply (RREP), and route error 
(RERR)—and comprises two operation modes: route discovery and route maintenance. A route 
discovery process is triggered when a node intends to send a packet to a destination node, whose 
address is not available in the local routing table. In contrast to AODV, each intermediate node 
appends its own address to the route record, in which is concatenated a list of the hop addresses 
taken by the route request packet. Furthermore, the intermediate node extracts the routes from the 
route record of the received RREQ message and saves them in its routing table. In the case of a 
link failure, for example, due to the node movement, the node that detects the error and is closest 
to the error source initiates a route maintenance process, to notify other nodes about the link loss. 
The route discovery is illustrated in Figures 20.4. 


20.3.5 Event Detection Algorithms and Protocols 


The event detection approach implemented in our system consists of a combination of algorithms 
and protocols that observe and evaluate events distributively across several sensor nodes. This 
distributed event detection system does not rely on a base station or any other means of central 
coordination and processing. In contrast, current approaches to event detection in WSNs transmit 
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raw data to an external device for evaluation or rely on simplistic pattern recognition schemes. 
This implies either high communication overhead or low event detection accuracy, especially for 
complex events (cf. Section 20.2). 

In a specific deployment of our system, the nodes are equipped with accelerometers and attached 
to a fence at fixed internode distances. When an event occurs, a lateral oscillation will propagate 
through the interconnected fence elements. As a result, the nodes can sample the acceleration at 
different distances from the event source. According to the neighbor-relative position, which is 
encoded in a unique node ID, each sensor can position itself relatively to its neighbors. A node can 
deduce its relative position simply by comparing its own ID with the neighbor ID. The relative 
position of the neighbors is required for each sensor node affected by the event, in order to perform 
a feature fusion. Ifan event occurs, each affected node concatenates its own features and the received 
features from the neighbors based on the relative sender position. This overall process is illustrated 
in Figure 20.5 and described in more detail in [3]. 
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Figure 20.5 Distributed event detection process. 
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The system works in three distinct phases: In the calibration phase, the acceleration sensors are 
automatically calibrated before and after each event. An application-dependent calibration routine 
takes noise and interruptions during the calibration period into account. 

The training phase is initiated by the user via the control station, which is used to accomplish the 
initial training and to calculate an optimal feature set. During the deployment, the control station 
may optionally also be used to serve as a base station for event reporting. The purpose of this phase 
is to train the WSN to recognize application-specific events by executing a set of representative 
training events for each event class. Every event class is individually trained by extracting a broad 
feature set from the sampled raw data and transmitting it to the control station. The control station 
performs a leave-one-out cross-validation algorithm [16] across all collected features to select an 
optimal subset of features. A prototype for each class is then calculated and transmitted back to 
each sensor node. The prototypes represent the a priori reference values needed for the classification 
of known patterns. 

Once the training is complete, the system enters the detection phase. In this phase, the sensors 
gather raw data that are preprocessed, segmented, and normalized (cf. Figure 20.5). Only features 
used in the prototype vectors are extracted from the raw data, and then combined to form a 
feature vector. The extracted feature vector is transmitted via broadcast to all sensor nodes in 
the 2-hop neighborhood. » is usually set to 1 for our scenario, since the radio range of a sensor 
node is significantly larger than the spacial expansion of the physical effects caused by an event. 
After receiving all required feature vectors, each node performs a feature fusion by combining 
the feature vectors based on a bit-mask and the relative position of the sender. In other words, 
each node builds its own event view using feature vectors from its neighboring nodes. Obviously, 
the combined feature vector is different for each node, because the respective neighboring nodes 
perceive the event depending on their location. Ideally, only the prototype classifier running on 
the node whose event view matches the trained view will classify the correct event, while the other 
nodes reject it. Finally, only the relevant detected events that require user interaction, for example, 
an intrusion or a fire, are reported to the base station of the WSN. 


20.4 Experimental Results 


In this section, we summarize the results from three major deployments that we conducted over the 
past years [3,13,17]. We first summarize the experiments related to the accuracy of the distributed 
event detection algorithm and then proceed to the evaluation of the energy efficiency of the system. 


20.4.1 Detection Accuracy 


In [3], we attached 100 ScatterWeb MSB sensor nodes to the fence elements of a construction 
site near our institute. We exposed the WSN to four different classes of events: shaking the 
fence, kicking against the fence, leaning against the fence, and climbing over the fence. For 
the training, we chose a region of the fence that was free of any external obstructions and trained 
the WSN with 15 repetitions of each event. We compared the results of this experiment with two 
prior experiments. In the first one, we exposed 10 sensor nodes that were attached to a fence to 
detect 6 different events. The system did not support autonomous training at that time. Instead, 
we relied on a custom-built heuristic classifier implemented in a rule-based middleware that was 
manually configured to classify events based on human visible patterns in the raw data. The second 
experiment we bring in for result comparison was part of an additional lab experiment. Here, we 
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trained three sensor nodes to cooperatively recognize four different geometric shapes based on the 
acceleration data measured by the sensor. The four shapes, comprising a square, a triangle, a circle, 
and the capital letter U, were drawn on a flat surface by physically moving the sensor nodes and 
then classified using a cooperative fusion classifier. There were three persons, each moving one 
sensor node along these shapes for a total of 160 runs. 

We collected the following metrics while exposing the system to the different classes of events 
(with TP = true positive, TN = true negative, FP = false positive, and FN = false negative): 


a Sensitivity = , also called recall, corresponds to the proportion of correctly 


#TP + #FN 
detected events. 
Scccifin FIN 

n IPEE = ATN + #FP 


m Positive predictive value (PPV) = 


corresponds to the proportion of correctly ignored events. 
#TP 


#TP + #FP 
the probability that correctly detecting an event reflects the fact that the system was exposed 


, also referred to as precision, corresponds to 


to a matching event. 
#TN 


FIN + #FN 
rectly ignoring an event reflects the fact that the system was not exposed to a matching 


m Negative predictive value (NPV) = corresponds to the probability that cor- 


event. 


#TP + FIN 


#TP + FIN + #FP + EN 
the population, that is, the sum of all correctly detected and all correctly ignored events. 


m Accuracy = corresponds to the proportion of true results in 


Figure 20.6 shows the sensitivity, specificity, PPV, NPV, and accuracy for three different 
deployments of our distributed event detection system. The algorithms and protocols as described 
in Section 20.3 are employed in the latter two deployments, i.e. in the lab prototype and in the 
distributed event detection deployment. We can observe that the system achieves near-perfect 
results under lab conditions. Also, the latest deployment performs substantially better than the 
initial proof-of-concept setup with the rule-based classifier. The current system achieves an overall 
detection accuracy of 87.1%, an improvement of 28.8% over the proof of concept. 
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Figure 20.6 Detection quality of distributed event detection for different deployments. 
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20.4.2 Energy Consumption 


Our experiments in [13] deal with the energy consumption of a sensor node during distributed 
event detection and recognition. To measure the energy consumption of the whole sensor board 
circuit as accurately as possible, we soldered a 10 Q shunt resistor into the supply line which is 
powered by a reference voltage of 5 V. To measure the voltage of the shunt resistor a digital sampling 
oscilloscope (DSO) was attached. As the resistor and the voltage are known, we can calculate the 
value of the current and use it to calculate the electric power used by the sensor node over the time 
of one DSO sample. By integrating the electric power over the time of one system state, like packet 
transmission, sampling or IDLE mode, we can exactly measure the energy needed and use this 
information to approximate the energy consumption of the sensor node over a certain time. During 
the event detection phase, we make use of the low power modes of each of our components. In 
detail, the sensor nodes use the MCU PD that also shuts down all internal peripherals. The wireless 
transceiver makes use of the WOR mode that enables the processing of incoming data by using 
a duty-cycle. All sensor nodes are aware of this duty-cycle and retransmit data until it is assured 
that the timeslots have matched. The acceleration sensor is active and monitors the movements 
of the fence elements to wake up the MCU and alert the application layer in case of suspicious 
acceleration data that are different from the noise level. 

An exemplary energy measurement is illustrated in Figure 20.7. During PD, a mean energy 
consumption of 9.0 mW is measured. During an event, the MCU is periodically utilized to fetch 
acceleration data from the acceleration sensor to the MCU (206.25 mW). This is followed by 
the feature extraction (350 mW) and classification (58.80 mW on average). As described in [3], 
a maximum of seven sensor nodes is involved in a fence event. Hence, in the phase of feature 
distribution one broadcast packet is sent (373.33 mW), and during classification up to six packets 
are received (178.5 mW) from the neighborhood. Finally, the result is calculated and sent to the 
base station. Afterward, the sensor node is recalibrated or, as in our example, the hysteresis function 
has converged and the sensor node immediately returns to the detection mode. The average time 
duration of an event is about 10 s. Including sampling, feature extraction, distribution, and 
classification, the average energy consumption for one event is, thus, about 145.4 mW. 

Figure 20.8 illustrates the average energy consumption and the resulting extrapolated network 
lifetime. The underlying scenario for this calculation is a deployment of seven nodes in which 
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Figure 20.7 Energy consumption during event processing. 
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Figure 20.8 Energy consumption and network lifetime of different system configurations. 


an event is generated and processed 5 times/h. For comparison, we also plot the numbers for a 
deployment of the same sensor network, but without any event-related activity. As can be seen 
in the figure, the lifetime of the network that employs centralized event detection is drastically 
reduced in comparison to the idle network due to the increasing energy consumption. In contrast, 
the distributed event detection is able to reduce the per-node energy consumption by 77%, thus 
increasing the lifetime of the network by 4 times to an extrapolated total of 41 weeks. By applying 
an acceleration-logic (ACC-Logic), we are able to wake up the MCU in the case, of an occurring 
event and make use of PD during the remaining time. In this way, the distributed event detection 
gains a lifetime improvement of about 16 times compared to a software logic calculated by the 
MCU. This underlines the importance of clever techniques that wake up further sensor node 
components only when they are required. 

In conjunction, these results underline the validity of our initial statement that the distributed 
event detection is able to jointly achieve the otherwise conflicting goals of high-accuracy event 
detection and long network lifetime. 


20.5 Conclusion 


Distributed event detection in WSNs, as discussed in this chapter, combines highly accurate event 
detection with low-energy consumption. Further, given the distributed design of the classification 
algorithm, it is robust against failures of individual nodes as long as the network has been deployed 
in a sufficiently dense manner. The key advantage of this approach is, however, its generality: Since 
most of the architecture, that is, all components above the level of feature extraction, operates 
independently of the deployment-specific sensors, the system can be employed in a variety of 
scenarios. In addition to our main use of perimeter security, the system can also easily be trained, 
for example, to recognize human motions, spatial temperature distributions, or patterns in infrared 
readings. 

With systems like the one covered in this chapter, research into event detection in sensor 
networks moves into a phase of incremental improvements. Depending on the difficulty of the 
application scenario, detection accuracies beyond the 90% mark are already feasible with current 
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platforms. In order to push the envelope and work toward the 99% mark, careful refinements of the 
sensing platform are necessary, for example, the choice of sensors and their mounting on the sensor 
node. Furthermore, the design space of which features to use in which type of deployment needs 
to be explored systematically, and the training process needs to be streamlined. In conclusion, one 
can say that—given the overall capabilities present in current sensing platforms —4ntelligent sensor 
networks are just around the next corner. 
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21.1 Introduction 


21.1.1 Historical Picture: Coverage and Tessellation 


Coverage may be defined as a task where the objective is to guarantee that a set of entities of 
interest (e.g., points, objects, or events) is completely covered. The covering is broadly defined. For 
example, it may be physical or using observation points. Coverage is one of the oldest problems in 
mathematics and physics. For example, in 1619, Johannes Kepler, a famous German mathematician 
and astronomer, published his seminal book entitled Harmonices Mundi that included the first study 
on tessellation [45]. The task of tessellation is a special coverage case where the goal is to cover 
infinite two-dimensional space using the repetition of a single or a finite number of geometric 
shapes. Of course, no overlaps or gaps are allowed. Probably, the most celebrated result related to 
tessellation was discovered by Yevgraf Fyodorov at the end of the nineteenth century. He presented 
proof that all periodic tilings of the plane feature 1 of 17 unique groups of isometrics. 


21.1.2 Coverage and Sensor Networks 


Although coverage has a long and rich history, it only recently emerged as a premier computer science 
research topic. This is a confluence of technology push and application pool. The technology push 
was provided due to the creation of sensor network. This rapidly growing area provides means for 
comprehensive surveillance of both objects and area under reasonable cost and energy constraints. 

The second part of the research and development impetus was provided by rapid emergence 
of security as one of the most important and desired system and application aspects. In a sense, 
coverage is the fourth wave of information security. The first was created in 1976 by the introduction 
of public key cryptography. It provided practical and theoretically sound techniques for ensuring 
privacy of data storage and data communication. The second is related to system security. In a sense, 
these techniques have longer and richer history than public key cryptography. Recent emphasis 
has been on hardware-based security and detection of malicious circuitry. The third wave aims at 
protection of the Internet and the WWW. Although this wave is by far the most diverse and covers 
issues from phishing to privacy, a significant emphasis has been on denial of service. 

The fourth wave that has been just started is related to physical and social security using large- 
scale sensing, computation, communication, and storage resources. It is often envisioned in the 
form of multiple sensor network that uses (standard) wireless communication infrastructure to 
enable transfer of data to computational clouds. While the exact system picture has been radically 
changing (e.g., initially network processing of collected data was a dominant system paradigm), the 
frontier component (sensor networks) has been constant in all efforts. 
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Coverage is naturally both a sensor network canonical task as well as the basis for numerous 
physical and social security tasks. It has extraordinarily broad basis and numerous coverage subtasks 
can be defined. 

The concept of coverage was introduced by Gage, who defined three classes of coverage problems: 
(1) blanket coverage (also known as area coverage), where the goal is to have each point of the region 
be within a detection distance from at least one of the sensors (static sensors, static objects coverage); 
(2) sweep coverage, where the goal is to move a number of sensors across the region as to maximize 
the probability of detecting a target (mobile sensors, static objects); and (3) barrier coverage, 
where the objective is to optimally protect the region from undetected penetration (static sensors, 
mobile objects). In addition, one can pose the fourth possible definition (mobile sensors, mobile 
objects). The last class of problems is not just practically very important, but also technically very 
challenging. Its theoretical treatment requires several probabilistic models. Its practical addressing 
requires sound and realistic statistical models that consider correlations. 

One can also envision many other generalizations of dynamic coverage problems. For example, 
a number of authors considered techniques for maximizing the lifetime of the network and, 
therefore, the length of the pertinent coverage. Also, coverage under multiple objectives and/or 
multiple constraints, most often related to sensing and communication, has been a popular topic. It 
is important to note that technological trends may evolve so that communication ranges are much 
longer than sensing. Nevertheless, multiobjective coverage has tremendous practical importance. 
For instance, it is a natural way to address common scenarios that detection of an object or an 
event can be accomplished only by using sensors of different modalities and therefore properties. 
Another important dimension is providing guarantees of proper functioning of the coverage system 
in the presence of faults or security attacks. 


21.1.3 Challenges in Solving Coverage Problems 


We place special emphasis on the following four types of challenges. 

Algorithmic challenges: Coverage problems are almost always intrinsically multidimensional. 
Many of them also include time dimension. Interestingly, some of the effective coverage problems 
can be naturally mapped into equivalent combinatorial and in particular graph formulation. For 
wide classes of coverage problems and, in particular, exposure problems, very often the most effective 
techniques involve variational calculus and its discretized realization using dynamic programming. 

Finally, in some applications it is important that the algorithms have their localized versions 
where each sensor node contacts only a small subset of other nodes using high quality communica- 
tion links in such a way that the overall global optimality is preserved completely or within a certain 
application bound. These types of coverage problems are most relevant in situations where one of 
the objectives is low-energy operation or preservation of the communication bandwidth. Also, this 
type of operation may be important when security is one of the important requirements. Our last 
remark is that probabilistic and statistical analysis of coverage algorithms is increasingly important. 

Modeling challenges: There are two main aspects that require careful modeling decisions. The 
first is the modeling of sensitivity of sensors. Of course, for different types of sensors different types 
of models are more appropriate. Initially, many coverage tasks were treated under assumption that 
the detection is binary, for example, either an object of interest is observed or not. Consequently, 
much more comprehensive sensing models are introduced. For example, exposure requires that an 
object of interest is under surveillance in such a way that an integral of closeness over time is above 
a user-specified threshold. Also, directionality of some type of sensors was recognized. Of course, 
more and more complex models can be and should be addressed. However, as is often the case in 
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statistics, a more complex sensing model does not imply a more realistic problem formulation and 
may significantly reduce (or enhance) the application domain. 

The other important modeling issue is related to targeted objects and terrain. For example, 
in many applications, the mobility models are of prime importance. It is common to start from 
simple and intuitive models and keep increasing their complexity. It is interesting to mention that 
mobility models, unfortunately, have a long and painful history of being not just tremendously 
speculative, but even obviously and deeply completely counterproductive. 

System challenges: It is customary that papers in top sensor networks are divided into two groups: 
theory and system. Not so rarely, theory papers are considered elegant and well mathematically 
founded but of rather low practical relevance. On the other hand, system people are primarily 
based on complete and demonstrated implementation that requires unacceptably high levels of 
abstraction and simplification. So, the first and most important system challenge is to combine 
useful properties of previous generation of both system and theory papers while eliminating past 
and some of the current problems. 

Other premier system problems include low cost realization and energy efficiency. The last 
metric is further enhanced to include low power requirement in particular in self-sustainable 
coverage systems. 

Security challenges: Security is one of the premier requirements in many applications and its 
relative role is rapidly increasing. It already ranges from privacy and trust to resiliency against 
hardware, software, and physical attacks. Very often, sensor networks used to ensure coverage 
are not attended or may even be deployed in hostile environments. Particularly interesting is the 
situation when two or more parties are observing each other and simultaneously aim to ensure high 
coverage while preserving their privacy of action. We expect that game theory techniques will be 
soon used in this context. 


21.1.4 Focus of This Survey 


In summary, coverage has a great variety of potential formulations and is a premier sensor network 
and emerging physical security task. In this survey we have three major objectives. The first is 
to survey the most popular and the most important, in terms of application coverage, tasks and 
proposed techniques. There are already several thousand coverage techniques. Therefore, it is not 
even possible to aim to be comprehensive. Instead, we focus on the most effective techniques that 
target most generic and pervasive coverage formulation. 

The second goal is to try to establish the place of coverage in the global picture and its relationship 
with other sensor network, security, and system design tasks and applications. Our final target is 
to identify and provide a research impetus for the most important and challenging new coverage 
research directions. 


21.2 The Coverage Problem 


In this section, we discuss the importance of the coverage problem in sensor networks and briefly 
review the topic of static coverage. In static coverage, the goal is to place the smallest number 
of sensors in such a way that an area of interest is observable. In comparison, dynamic coverage 
addresses the situation in which either the sensors or the objects are allowed to move in the area 
of interest. A special case of dynamic coverage is the exposure problem in which the detection 
is accomplished if an integral over time of a specific sensing function is large enough to ensure 
detection and possibly the characterization of the pertinent object. 
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21.2.1 Historical Perspective 


As we indicated in Section 21.1, coverage is an optimization problem, in particular with a long 
history in mathematics and crystallography, and more recently in robotics, computational geometry 
(e.g., art gallery problems), and television and wireless networks. However, the explosion of interest 
in coverage received a tremendous impetus with the emergence of sensor networks somewhere 
around the turn of the last century. 

Table 21.1 provides the quantification of our claim. It shows the number of sensor coverage 
papers according to Google Scholar. We see that while the overall number of papers is relatively 
constant per year, the number of papers with words “coverage in sensor networks” has experienced 
consistent growth and increased by more than 30 times in the last decade even when normalized 
against slight growth of the overall number of papers. The overall number of papers is actually 
increasing every year, but nontrivial latency in paper indexing hides this growth. It also results in 
reporting somewhat understated growth in the number of coverage papers. 

There have been several survey papers completely dedicated to coverage in sensor networks 
[9,16,35,41]. In addition, several ultra-popular comprehensive surveys of sensor networks devoted 


Table 21.1 The Number of 
Sensor Coverage Papers According 
to the Google Scholar Database 


Year Coverage Total 

2001 190 2,670,000 
2002 343 3,020,000 
2003 681 3,100,000 
2004 1440 3,120,000 
2005 2460 2,970,000 
2006 3470 3,040,000 
2007 4360 2,950,000 
2008 5190 2,810,000 
2009 6020 2,510,000 
2010 6990 2,400,000 
2011 6750 3,100,000 
2012 492 205,000 


Note: The first column indicates year. 
The last two columns indi- 
cate the number of papers that 
address coverage in sensor 
networks and the total num- 
ber of papers in the database, 
respectively. The data for 
2012 include only publications 
indexed in January. 
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a substantial space to coverage [5,7,33,64,99]. Also, a large number of surveys have been published 
on more specific aspects of coverage [32,33] in particular using visual sensors [4,19,85] and 
energy-efficient coverage [6,27,38,61]. 


21.2.2 Applications and Architectures 


Sensor networks provide a bridge between computational and communication infrastructures (e.g., 
the Internet) and physical, chemical, and biological worlds. The number of potential applications 
is unlimited. Most often, environmental, infrastructure security (e.g., pipelines and building), and 
military and public security are addressed. More recently, wireless health and medical applications 
have emerged as one of the most popular research directions. 

Initially, Internet research has had a dominating impact on the wireless sensor network research. 
Energy has been recognized as one of the most important design metrics. In addition, there has been 
an emphasis on efficient usage of bandwidth. Ultra-low-power operation of wireless sensor networks 
was the focus of many wireless sensor network efforts. Therefore, the ultra-low-power node with 
very short communication ranges was accepted as the preferred architecture building block. 

However, in the last several years it has been widely recognized that rapid progress in wireless 
mobile network provides numerous advantages. For example, mobile phone-based participatory 
sensing that involves human interaction has emerged as the dominant architecture paradigm. 

Both applications and architectures have profound ramifications on how coverage problems 
are formulated and addressed. For example, the use of mobile phone infrastructure eliminated 
limitations and concerns about communication range that is now much higher than the sensing 
range of essentially all sensors. Also, the need for localized algorithms is greatly reduced and much 
more complex definitions of coverage that require much higher processing resources and energy can 
now be realistically addressed. On the other hand, latency has gained importance over throughput. 

Also, each type of application requires new definitions of coverage. For example, medical 
applications can benefit little from traditional notions of coverage. In order to establish credible 
medical diagnosis, significantly more complex processing is needed that blurs distinctions between 
coverage and sensor fusion. It also introduces many new aspects such as sizing of sensors and its 
impact on coverage. 


21.2.3 Real-Time Coverage 


Operation in real time is essential for a majority of coverage applications that use sensor networks. 
Surprisingly, this topic still does not receive a proportional amount of research and effort. This is 
unexpected, in particular, since one of the three tracks of the most prestigious real-time conferences, 
Real-Time Systems Symposium, is dedicated to sensor networks. One of the first and most 
influential papers in this domain is by Jeong et al. [43], which addresses the problem of observing a 
net of actual pathways where vehicles move a specified maximal speed. Under a set of assumptions 
that include the maximal car density, the goal is to ensure that all intruding targets are detected 
before they reach any of the protection points. The objective is to maximize the lifetime of a sensor 
network that is used for coverage. The algorithm is based on the Floyd—Warshall algorithm to 
compute the all-pairs shortest paths formulation. In order to maximize the lifetime of the network, 
different sensors are assigned to different duty-cycle schedules. Jeong and his coauthors presented 
both centralized and localized algorithms for early detection of targets on a graph (i.e., highway or 
street network). Zahedi et al. [100] further explored the problem of trade-offs between the quality 
and duty-cycle (energy) of the sensors. 
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Trap coverage is a very interesting and natural formulation of coverage that is related to real-time 
detection and, in particular, latency of detection. It is also a way to address approximate coverage 
when the number of available sensors is pre-specified. Until now we mainly discussed coverage 
techniques in which complete coverage, of a targeted field is the objective. In trap coverage, holes 
in coverage are allowed, but only if their number and their size are below specified measures. 
One such measure that captures latency of detection is a time that an intruder spends in straight 
line travel at a specified speed before being detected. Recently, this problem has been addressed 
both under and not under the assumption that energy efficiency is one of the requirements 


[10,57]. 


21.2.4 Static Coverage 


Although our survey is focused on dynamic coverage in sensor networks, it is important to discuss 
static coverage in which the goal is to cover a specific area using the smallest number of sensors. 
An alternative formulation is one in which the goal is to cover a maximum subarea of a given area 
using a specified number of sensors. 

Although static coverage is probably conceptually the simplest possible formulation of any cover- 
age problem, almost all of its instantiations are still NP-complete. For example, these instantiations 
can be often mapped to the dominating set problem. Interestingly, when we consider coverage of 
a rectangular area using disks, the complexity of the corresponding optimization is not known. 

One of the first approaches to address static coverage was presented by Slijepcevic and Potkonjak 
[81]. They proposed two techniques: One uses simulated annealing and the other employs integer 
linear programming (ILP). In addition, D. Tian first as a student and later with his research group 
proposed a number of techniques for static coverage [89,90]. 


21.3 Barrier Coverage 


In barrier coverage, the objective is to protect the area from unauthorized penetration. We discuss 
in detail several types of barrier coverage including perimeter coverage, where the objective is to 
cover with sensors a narrow strip along the boundary of the region; the maximum breach path 
problem, where the goal is to find a path that maximizes the minimum distance to any sensor; and 
the minimum exposure path problem, whose objective is to find a path of minimum exposure, 
where the exposure of the path is defined as the integral of the sensing signal along that path. 


21.3.1 Perimeter Coverage 
21.3.1.1 Problem Formulation 


The objective of perimeter coverage is to study ways to detect an intrusion into a protected area 
by placing sensors near the border of the monitored region. There are two aspects of that problem: 
the placement problem asks to determine a placement of the sensors that offers optimal or near 
optimal protection for given resources or costs, and the assessment problem asks, given a placement 
of sensors, to evaluate how well they protect the area. 

Instead of placing sensors on the boundary line, most authors consider instead placement in 
a belt area, a narrow region between two parallel lines containing the boundary, which we refer 
to as the outside and the inside of the belt, respectively, where sensors should be placed. If the 
boundary of the belt region is connected the belt is called open, and otherwise it is called closed 
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(a) (b) 


Figure 21.1 Types of belts depending on the boundary type: (a) open belt, when the boundary 
is connected, and (b) closed belt, when the boundary is disconnected. 


(Figure 21.1). We refer to the short lines in an open-belt region connecting the outside to the 
inside boundary as the /eft and the right boundary, respectively. A belt with inside and outside 
boundaries 4 and /y, respectively, has width w, if for each point pi in / and each point p2 in h 
dist(p1, 4) = dist(p2, /1) = w. Here, dist(p;, 4) is defined as the minimum distance between p; and 
any point in /. 

Since any coverage of the whole area also covers the belt and the belt region is typically much 
smaller, it is clear that perimeter coverage is often much more cost-effective than the full-area 
coverage. 

Kumar et al. [55,56] were one of the first to study the perimeter coverage problem in detail. They 
define two versions of the problem. The weak k-barrier coverage version considers only breaching 
paths with lengths equal to the belt width (called orthogonal paths). The rationale behind that 
restriction to the paths that we want to cover is that an intruder without a prior knowledge of the 
location of the sensors will likely choose an orthogonal path, since such a path is shortest and hence 
it minimizes the detection expectation. The strong k-barrier coverage version considers all paths 
crossing the complete width of the belt (called crossing paths) as possible breach paths. The regions 
are weakly k-barrier covered (respectively, strongly k-barrier covered) if every orthogonal (respectively, 
every crossing) path crosses the sensing region of at least & sensors. We call the maximum value of 
k for which the region is &-covered as the strength of the coverage. 


21.3.1.2 Strong k-Barrier Coverage 


Kumar et al. [55,56] consider two versions of the strong k-barrier coverage placement problem: a 
deterministic and a probabilistic one. In the deterministic version, sensors are placed on explicitly 
determined locations, while in the probabilistic one they are placed randomly according to a given 
probability distribution. 

For the deterministic version, they prove that an optimal placement of the sensors in an open- 
belt region is on a set of & shortest paths called separating paths that separate the outside from the 
inside portion of the belt so that the sensing regions of the sensors touch or overlap inside the belt 
(Figure 21.2). In the case where the sensing region of each sensor is a disk of radius r, they also 
prove that the smallest number of sensors necessary and sufficient to cover an open-belt region is 
k[s/2r], where s is the length of a shortest separating path. 

For the probabilistic version of the placement problem, Liu et al. [60] show that whether a 
random placement of sensors in a rectangular belt yields a &-barrier coverage depends on the ratio 
between the length 4 and the width w = w(A) of the belt. Specifically, if the sensors are distributed 
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Figure 21.2 Placing the sensors on two separating paths results in a strong 2-barrier coverage 
of the region. 


according to a Poisson point process with density A, then if w(/) =Q (log 4), the region is k- 
barrier covered with high probability if and only if the density A of the sensors is above, a certain 
threshold. If, on the other hand, w(») = o(log h), the region does not have a barrier coverage with 
high probability for any A. With high probability (w.h.p.) means that the probability tends to 
1 as / tends to infinity. The strength of the coverage for a fixed density y grows proportionally 
with w(h)/r. 

Another interesting question is, given a belt and the positions of a set of sensors placed in it, to 
determine whether the sensors provide a barrier coverage and to find the strength of such a coverage. 
Kumar et al. [55,56] answer that question for open-belt regions by reducing the aforementioned 
problem to the problem of finding a set of node-disjoint paths in a graph. They define a coverage 
graph G whose nodes are the sensors of the network and whose edges connect all pairs of nodes 
whose corresponding sensors have overlapping sensing regions. They also define two additional 
nodes u and v and edges between u (respectively v) and all nodes whose corresponding sensing 
regions intersect the left (respectively right) boundary of the belt. Using Menger’s theorem [96, 
p- 167], they prove that -barrier coverage by the given sensors of the belt is equivalent to the 
existence of & vertex-disjoint paths between u and v in G. Moreover, computing the maximum 
number of & vertex-disjoint paths between u and vin G can be done in time O(k7 + m), where n 
and m are the number of the nodes and edges of G. However, the same proof cannot be used for 
the closed-belt case since Menger’s theorem is not applicable to that case. The assessment problem 
for strong &-barrier coverage for closed-belt regions is still an open problem. 


21.3.1.3 Weak k-Barrier Coverage 


Weak barrier coverage allows only crossing paths that are perpendicular to the belt boundary. In 
[56], Kumar et al. consider sensors that are Poisson distributed with density mp and ask the question 
which values of np produce a weak barrier coverage with high probability. We can think of the 
parameter n as corresponding to the total number of the sensors and p as the probability of each 
sensor being awake at any given time. Kumar et al. define function 


c(s) = 2npr/(s log(np)) 
and show that, for a belt of width 1/s and for any e € (0, 1), if 


(log(log np)) TE + (k — 1) log(log np) 
log(ap) 


c(s) > 14 (21.1) 
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for sufficiently large s, then all orthogonal lines crossing the belt are -covered with high probability 
as s > 00. On the other hand, if 


(log(log np)) E + log(log np) 
log (np) 


c(s) < 1 (21.2) 


for sufficiently large s, then there exists a non-1-covered orthogonal crossing line in the belt with 
high probability as s > 00. Condition (21.1) is a sufficient condition for achieving &-barrier weak 
coverage and condition (21.2) provides a necessary condition (if the inequality is reversed) for 
1-barrier weak coverage. Evidently, there is a gap between the two bounds and finding an optimal 
k-barrier weak coverage condition is an interesting open question. 

As noted in [56], the right-hand sides of (21.1) and (21.2) tend to 1 as s —> oo. Hence, 
asymptotically the critical value for c(s) = 2npr/(slog(mp)) is 1, meaning that there should be at 
least log(np) sensors deployed in the r-neighborhood of each orthogonal crossing line in order to 
produce a weak barrier coverage of the region. 

In a different approach to the problem, Li et al. [58] found a lower bound on the probability 
for a weak k-barrier coverage, given the size of the region and the number and the distribution of 
the sensors. Specifically, they show that if the belt region is a rectangle with dimensions s x 1/s, 
r is the sensing radius, the sensors are distributed according to a Poisson point process with density 
np, and B, denotes k-barrier coverage, then 


n 2 
k—1 a k-1 i 
Pr(By) = FE Sp SEY | [a SS E 
; j! ; j! 


Given the placement of the sensors, a natural question to ask is whether those sensors provide 
a weak &-barrier coverage. Answering that question is easier in the weak barrier coverage case than 
the similar question for strong barrier coverage. The reason is that, for weak coverage, the vertical 
positions of the sensors do not matter as only vertical paths are considered. Hence, the problem 
can be reduced to a one-dimensional case: just consider the projections of the sensor positions onto 
the line segment S defining the internal (or external) belt boundary and determine whether those 
projections &-cover that segment. Li et al. [58] present a simple algorithm that considers the set Q 
of the endpoints of all sensing intervals on S, that is, for each point x on S corresponding to a sensor 
projection, we add points x — rand x + r to Q. Then S is swept from left to right keeping track on 
how many sensors cover each point. The resulting algorithm has time complexity of O(N log N), 
where N is the number of the sensors. 


21.3.1.4 Other Perimeter Coverage Results 


Kumar et al. establish in [56] that it is not possible to determine locally whether a region is strongly 
k-barrier covered or not. This is in contrast to the full-area coverage case, where a “yes” answer is 
not possible, but a “no” answer is, that is, it is possible in the full coverage case to determine that 
a region is not k-covered. In order to deal with the problem of local barrier coverage, Chen et al. 
[20] introduce the notion of L-local barrier coverage. Informally, having L-local barrier coverage 
requires that any path contained in a box of length at most L be covered (or k-covered). Hence, 
L-local barrier coverage is a generalization of weak coverage for L equal to zero and to strong barrier 
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coverage for L equal to the belt length. If Z is sufficiently small, it is possible to locally determine if 
the region is not L-locally &-barrier covered, as proved in [20]. 

Chen et al. [21] use the idea of £-local barrier coverage in order to quantify the quality of 
k-barrier coverage. Previously, the quality measure has been binary—1 if there is &-barrier coverage 
and 0 if there isn't. Chen et al. define the quality of k-barrier coverage as the maximum value of 
L for which the belt is Z-local &-barrier covered. If there is no such £ then they define the quality 
as —1. They design an algorithm that computes the quality given the sensor positions and a value 
for k. Their algorithm also identifies weak regions that need extra sensors. The property of being 
able to quantify the quality of barrier coverage is analyzed from another perspective and in much 
more detail in the following subsections. 


21.3.2 Maximum Breach Path 


The maximum breach path tries to determine the least covered (the most vulnerable) path between 
a pair of points. In this context, a measure of how well a path p is covered is the minimum 
distance between any point of p to any of the sensors. The key conceptual difficulty is that there are 
continuously many possible paths for the intruder. Nevertheless, this is one of the first problems 
of coverage in sensor networks that has not only been addressed but actually solved optimally. 

The key idea behind the solution is remarkably simple. The crucial step is to translate this 
computational geometry and continuous problem into an instance of graph theoretical problem. It 
is easily accomplished using the notion of a Voronoi diagram. A Voronoi diagram is a tessellation 
of the space using piecewise linear connected components. If we have two sensors, A and B, the 
line of separation between them is orthogonal to the line that connects them and passes through 
the middle of the distance between these two sensors. It is easy to see that during calculation of 
dynamic coverage, it is sufficient to consider only Voronoi diagram edges and more specifically 
their weight, which is equal to the distance of the closest point on the Voronoi diagram edge to 
either one of two sensors that define it. The justification for this observation is that if the intruder 
does not use for his traversal only Voronoi diagram edges, it will become closer to at least one of 
the sensors that are used to define the pertinent Voronoi diagram edge. 

Now, in order to find if there is a breach in the system of deployed sensors of length /, all 
that is required is to check if there is a path in the graph that is defined on top of the Voronoi 
diagram, where at least one edge is not larger than a specified value. There are many ways to 
accomplish this task. Conceptually probably the simplest is one where we iteratively add larger 
and larger edges until there is a path from the starting point to the ending point. There are several 
important observations about this approach. One is that one can easily consider the case where 
different sensors have different sensitivity ranges, or even one can superimpose a grid over the area 
and define for each field in the grid the level of sensitivity over a single or multiple sensors. All these 
problems can be easily solved using dynamic programming. The much more in-depth technical 
presentation of these algorithms can be found in [62,65]. 


21.3.3 Minimum Exposure Path 


As we already said several times, one of the key degrees of freedom in defining the coverage problem 
is related to the way in which we define the sensitivity with respect to a single or multiple sensors. 
The exposure is a generalization of dynamic coverage in the sense that it is asked whether it is 
possible to find a path through a particular field covered with sensors in such a way that the total 
integral of exposure over time to sensing by all relevant sensors is below the user-specified value. 
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There are two conceptually similar but highly different ways, in terms of implementation, to 
address this problem. The first one uses rasterization of the pertinent field into a particular grid 
or some other structure where in each field all points are sufficiently close to each other. This is 
easy to accomplish by decreasing the size of individual fields. For each small area, we can calculate 
the amount of exposure for any given period of time. Now, under the natural assumption of 
constant speed, we can easily use dynamic programming to find the path of minimal exposure from 
a starting point s to a destination point d. This task can be easily accomplished in polynomial time 
that depends on additional constraints that may be imposed on the definition of exposure. This 
solution was presented by Meguerdichian, who subsequently changed his last name to Megerian, 
in [63,68]. 

Another very interesting approach uses variational calculus to solve the exposure problem in a 
way that guarantees the correct solution (by Veltri et al. [91]). The key idea is to solve a small 
number of simplified problems such as one where very few sensors are used and to concatenate 
these locally optimal solutions into one that is globally optimal. 

An approximation algorithm for the exposure problem with provable accuracy and polynomial 
running time was designed by Djidjev [29]. In this algorithm, the points are not placed on a 
grid covering the region (rasterization), as in the previous algorithms, but only on the edges of a 
Voronoi diagram for the set of the sensors. This, in effect, replaces a two-dimensional mesh by 
an one-dimensional mesh, significantly reducing the computational complexity of the algorithm. 
For any given e > 0, the algorithm from [29] can find a path with exposure no more than 1 + € 
times larger than the optimal. Hence, by reducing the value of £, one can get paths with exposures 
arbitrarily close to the optimal. The running time of the algorithm is proportional to ne”? log n, 
assuming that the Voronoi diagram does not have angles very close to zero. 


21.4 Coverage by Mobile Sensors 


In the mobile version of the coverage problem, the goal is to cover a region of interest with mobile 
sensors so that the trajectories of the sensors go through points or areas of interest at predetermined 
time intervals, form barriers, or relocate themselves to better static locations. 


21.4.1 Sweep Coverage 


Li et al. [59] consider the following problem they call the sweep coverage problem: There are 
n mobile sensors located in a region that contains m points of interest (POIs) that need to be 
monitored. The sensors move at the same constant speed v and a POI is considered covered at a 
given time if a mobile sensor is at that location at that time. Given a coverage scheme (schedule), a 
POL is considered t-sweep covered if it is covered at least once in every time interval of length +. The 
goal is to design a coverage scheme so that each of the m POIs is t-sweep covered. A more general 
version of the problem specifies individual sweep periods +; for sensor £;. 

It is proved in [59] that the t-sweep coverage problem is NP-hard by reducing the traveling 
salesman problem to it. An even stronger result is proved in the same paper [59] that the t-sweep 
coverage problem cannot be approximated within a factor of less than 2 unless P= NP. It is also 
shown that for any € > 0 there exists a polynomial time algorithm for solving the t-sweep coverage 
problem within a factor of 2 + e. The algorithm uses the 1 + e-approximation algorithm for the 
traveling salesman problem [8] to construct a short route r visiting all POIs exactly once. Then r is 
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divided into n equal parts, one for each of the p sensors. Finally, each sensor is assigned to monitor 
one of the parts p; of p by moving forward and backward along p;. This algorithm is generalized 
in [59] for the case of different sweep periods for the POIs, resulting in an algorithm with an 
approximation ratio of 3. 


21.4.2 Optimal Repositioning of Mobile Sensors 


The problem of repositioning the sensors so that they provide a better barrier coverage while 
minimizing the distance they have to travel or the energy they need to consume is studied in 
[11,15,88]. Bhattacharya et al. [15] assume that 7 sensors are initially located in the interior of a 
planar region and study the problem of how to move the sensors to the boundary of the region so 
that the distance along the boundary between two consecutive sensor positions is the same. Hence, 
after repositioning the sensor positions will form a regular 2-gon that is called destination polygon. 
We call the new position of each sensor the destination of that sensor. There are two versions of the 
problem: 


m The min-max problem, aiming to minimize the maximum distance traveled by any sensor 
m The min-sum problem, where the objective is to minimize the sum of the distances traveled 
by all sensor 


For both problems they consider two type of regions: a unit disk and a simple polygon. We discuss 
first the algorithms for the min-max problem and then for the min-sum problem. 


21.4.2.1 Min-Max Problem 


For the min-max problem on a disk region, Bhattacharya et al. call a positive real number A 
feasible, if all the sensors can move to the new positions on the boundary of the disk that form a 
regular 2-gon P and the maximum distance between an old and a new position of any sensor does 
not exceed A. Such polygon P is called A-feasible. Hence, the min-max problem is equivalent to 
the problem of finding the minimum feasible number Amin and a Amin-feasible polygon. If we can 
construct an algorithm to check the feasibility of any number in time T (7) and we know an interval 
containing An, then we can do a binary search on that interval, at each step reducing twice the size 
of the interval containing Amin. Clearly, the interval [0, 2] contains Amin since the distance between 
any two points in the disk cannot exceed its diameter. Hence, the running time of the resulting 
algorithm will be T(z) log(1/e), where e > 0 is the required accuracy. Using a more complex 
binary search algorithm that uses a finite set of candidate new-position points, Bhattacharya et al. 
show that the exact value of A yy; can be found in time O(7 (n) log n). 

For testing feasibility of a number A > 0 for 7 sensors on positions A1, . . . , Áp inside a circle C, 
Bhattacharya et al. construct for each i a circle of radius A and center A; and consider the two 
intersection points of that circle with C. The resulting set Q contains 27 points. It is shown that, 
if A is feasible, then there is a A-feasible n-gon one of whose vertices is in Q. Hence, assuming A 
is feasible, one can find a A-feasible n-gon by checking each of the regular 2-gons that contain a 
node in Q, whose number is at most |Q| = 27. Then the problem is reduced to checking whether 
the vertices B,,...,B, of each of those 2” polygons can be mapped to distinct points among 
Ay,...,A, so that for each 7 the distance between B; and A; is at most A. The latter mapping 
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problem can be solved using an algorithm due to [40] for finding a prefect matching in a bipartite 
graph with time complexity of O(n?°). The total complexity of the resulting feasibility-checking 
algorithm is O(n’), and the resulting complexity of the min-max algorithm is O(m?? log 7). 

Tan and Wu [88] improve the complexity of the min-max algorithm for a disk from [15] by 
using a better characterization of A,»i,-feasible polygons. Specifically, they show that if B4, . . . , B, 
are the vertices of a A,»in-feasible n-gon such that |A;B;| < A for all ¿, then either 


1. For some 7 such that |A;B;| =A, the line joining A; and B; contains the center of C 
2. For some i =j, |A;B;l = |4;5;l =À 


Using this fact, one can construct a set of all 7 distances of type (1) and all, say m, distances 
of type (2). Doing a binary search on that set will yield in O(log(n + m)) feasibility tests the 
value of Amin and the corresponding n-gon. Unfortunately, in the worst case m can be of order n’, 
which implies that the worst-case complexity of the resulting min-max algorithm will be O(⁄°). 
By employing a more elaborate search procedure, Tan and Wu [88] show that the complexity of 
their algorithm can be reduced to O(n? log n). 

For the min-max problem in a simple-polygonal region P, Bhattacharya et al. [15] show that 
their algorithm for disk regions can be adapted, resulting in an algorithm of time complexity 
On?! log n), where / is the number of the vertices of P. The additional factor of / comes from 
the fact that the intersection of a circle centered at a sensor and the boundary of P can consist of 
upto / points, unlike the disk-region problem when it consists of at most two points. 


21.4.2.2 Min-Sum Problem 


Unlike the min-max problem, for the min-sum version no exact polynomial algorithm is known 
yet, and neither is known whether the problem is NP-hard or not. The reason is that, for the 
min-sum problem, no characterization of the Ayp;-polygon is known that would allow for reducing 
the search space from continuous to discrete, as it is in the min-max version. Instead, it is shown 
in [15] that the destination of at least one sensor in any optimal 7-gon belongs to a specified 
short segment along the circle C. Based on that fact, the corresponding segment for each of the 
sensors A; is discretized by adding O(1/€) equally spaced points, each of which is then considered 
as a candidate of a destination for A;. Then, for each sensor and candidate, a minimum cost- 
weighted matching problem is solved for a weighted graph whose nodes are the sensors A; and 
the vertices of the currently considered n-gon candidate, whose edges join each sensor and each 
polygon vertex, and whose edge weight is equal to the Euclidean distances. The matching problem 
can be solved in O(n?) time using the algorithm from [54]. The complexity of the resulting 
min-sum approximation algorithm is O(n? /€) and the approximation ratio is 1+ €. A similar 
approximation algorithm can be constructed for the min-sum problem for a simple-polygon region 
with time complexity O(/n? /£) and approximation ratio 1 + €, where / is the number of the vertices 
of the polygon. 

Tan and Wu [88] consider a special version of the min-sum problem, where the sensors are 
initially positioned on C. For that version, they show that an exact polynomial-time algorithm 
for the min-sum problem does exist, and its complexity is O(*). Their algorithm is based on a 
characterization of the optimal solution that limits the search space for a destination polygons to a 
discrete set. Specifically, they show that in any optimal solution, there exists at least one sensor A; 
whose destination is A;, that is, that does not change its position. 
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21.5 Other Coverage Issues 
21.5.1 Wireless Links and Connectivity 


There exists a large literature on simultaneous maintenance of coverage and connectivity. As we 
already stated, originally the sensor research community was targeting wireless sensor nodes with 
ultra-low-power radios and multi-hop communication. This type of wireless links has been widely 
studied experimentally and using statistical generalization in terms of their transmission properties 
as well as quality of link vs. energy consumption properties. Unfortunately, many of these studies 
are to a serious extent unrealistic because it was not recognized that the radio consumption model 
is such that listening is often as expensive as receiving or transmission. 

It has been recognized that there exists high positive and negative correlation in link qualities, 
both spatially and in the temporal domain. Some of the key references in these domains are 
[17,18,73,102]. With the change of architecture of wireless sensor networks from ultra-low- 
power multi-hop communication to communication using wireless phone infrastructure, many 
fundamental assumptions about the role of communication in coverage tasks are drastically altered. 
For example, in this new architecture, it is very rarely the case that communication is the bottleneck 
and much higher emphasis is on the use of sensors in the best possible way. 


21.5.2 Multi-Objective Coverage 


Multi-objective coverage is one where at least two objectives or two constraints have to be addressed 
during node deployment or operation. The initial literature focused on maintaining sensing cover- 
age and connectivity in large sensor networks [92,98,102]. In this situation, the key assumption is 
related to the ratio of communication range and sensing domain. In particular, a very interesting 
situation is when these two entities are of relatively similar cardinality. These problems may not be 
an issue in mobile phone—based sensor networks, but multi-objective is bound to emerge as one of 
the most important definitions of coverage. 

For example, in many security applications, it is essential that we observe the enemy while the 
enemy is not able to observe us. Also, it is easy to imagine that in many types of coverage one has 
to ensure that fundamentally different types of sensors are able to collect information (e.g., audio 
and visual sensors). These sensors may have not just different sensitivity ranges, but also they may 
or may not be directed with various angles of coverage. The key goal here is to make adequate and 
simple to use sensing models as well as to find which type of sensor fusion is most relevant in a 
particular application. 


21.5.3 Localized Algorithms Coverage 


Localized algorithms are those that are executed on a small number of sensor nodes that are close 
to each other in terms of quality of their communication links and/or in terms of sensed events. 
Localized algorithms are important for several reasons. They are intrinsically low energy and fault 
tolerant. Localized protocols usually induce much lower latency and preserve bandwidth. Finally, 
in very large networks they are the only practical alternative. 

A comprehensive but certainly somewhat outdated survey on localized algorithms has been 
published in 2004 [32]. Several authors have been able to develop localized coverage algorithms 
that are optimal or competitive with corresponding centralized algorithms [43]. Interestingly, even 
algorithmic paradigms have been developed for creation of localized algorithms [70,86]. The key 
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idea is to use as a starting point any regular centralized algorithm. The results of the pertinent 
centralized algorithm provide statistical knowledge about which information should be used in 
which way in the corresponding algorithm. The final step is to use statistical validation techniques 
for the evaluation of the localized algorithm. It is important to emphasize that different instances 
of the coverage problem should be used for the learning and testing phases. Of course, for best 
performance the whole procedure is reiterated in a loop until the specified level of discrepancy 
between the centralized and the localized algorithms is found. 


21.5.4 Lifetime and Energy-Efficient Coverage 


It has been realized early that energy is one of the most severe constraints in wireless sensor networks. 
For example, Srivastava et al. [84] recognized that in the Smart Kindergarten project, batteries 
have to be changed at least once per day and that in order to instrument a sufficient number of 
subjects (kids) for the duration of the project, one would spend millions of dollars only on batteries. 
Therefore, a number of approaches have been developed to maintain one or more formulations of 
coverage while minimizing energy consumption. 

The main idea is to schedule different subsets of sensors to be active in any given point of time in 
such a way that each group of sensors in each subset is sufficient to guarantee the coverage objective 
while the number of subsets is maximized. It is related to the well-known -coverage problem 
in graph theoretic literature, which is NP-complete. Interestingly, in many applications with a 
relatively small number of nodes (up to several hundred), one can obtain the optimal solution 
using ILP [47,69]. It is interesting to note that there are also a very large number of survey papers 
that are completely dedicated to energy-efficient strategies in wireless ad hoc and sensor networks 
[6,27,38,61]. In particular, a large number of heuristics have been developed to maintain network 
coverage using low duty-cycle sensors [26,30,44,7 1]. 


21.5.5 Fault Tolerance and Errors 


There are two major sources of sensing data errors that have been widely considered. The first is 
that sensor measurement may provide incorrect values. The second source of error is less dangerous 
for the accuracy and the correctness of the evaluation of coverage and is related to missing data. 
There are three main types of errors that have high impact on coverage algorithms and appli- 
cations. The first is related to readings of detection sensors. The second is associated with location 
errors [82]; these are particularly important for mobile sensors. These two types of errors may be 
both in terms of missing data or incorrect measurement. The final type is related to communication 
using lossy links and is of the missing data nature. Note that once real-time issues are considered, a 
new type of error related to late-arriving data emerges. It is important to note that in more complex 
scenarios, new types of errors may play important roles. For example, if nodes use a sleep mode for 
energy conservation, errors in time synchronization may be of essential importance [37,43]. 
There is a tremendous amount of literature in sensor measurement data. By far, the most popular 
approach is to assume independent errors that follow a Gaussian distribution. A number of inter- 
esting and theoretically important results are established under these assumptions. Unfortunately, 
the actual properties of real errors in data essentially always have highly nonparametric distributions 
and rather high spatial temporal correlations. It has been demonstrated that assuming a Gaussian 
error distribution may result in location errors that are several orders of magnitude higher than if 
nonparametric models that consider correlations are used for location discovery [31]. Conceptually, 
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the most difficult problem with error modeling is that in many applications corresponding signals 
are nonstationary. 

There have been several efforts to accurately and realistically model errors of individual sensors 
[34,51] and errors and communication links of a system of sensor and wireless nodes [48,49]. 

There is a complex interplay between error properties and optimization techniques used for 
calculating or optimizing coverage. In some situations, there are readily available provably optimal 
solutions. For example, if the coverage problem can be optimally solved using error-free data and 
if an error model is Gaussian, convex programming addresses the same problem in the presence 
of error optimally. Unfortunately, this situation rarely has practical benefits [52]. The impact of 
realistic error models is discussed in detail using several sensor networks applications [82]. 

In many scenarios, sensor networks for coverage are deployed in hostile environments where 
repair is either difficult or essentially impossible. In some scenarios, the environment is harsh and 
may have highly negative impact on the reliability of the sensors. Essentially all scenarios in which 
sensor networks are used to establish coverage are not attended by humans. Therefore, it has been 
recognized that there is a need for fault-tolerant coverage. 

The most natural and the most popular way to ensure fault tolerance is through the use of 
redundancy [24]. In particular, &-cover algorithms simultaneously provide both energy efficiency 
and fault tolerance [1,47,53]. Interestingly, a much more efficient approach can be derived when 
tolerance is treated within the framework of sensor fusion [50,75]. 


21.5.6 Dealing with Uncertainty 


Coverage under uncertainty in terms of locations of nodes has been widely studied [13,25,36,72,87]. 
Many of these efforts use mathematically sophisticated concepts (e.g., homology) or verification 
techniques. We expect that soon other uncertainty degrees of freedom will be addressed. For 
example, probabilistic or, even better, statistical guarantees of the coverage quality in the presence 
of uncertainty about the actual actions of other side (attacker, intruder), will be essential in many 
applications. One such potential framework to address these issues is the use of game theory. 


21.5.7 Visual Coverage 


One of the key predecessors of coverage is tasks in computational geometry such as art gallery 
observation by a limited number of agents. It is assumed that an agent can detect object at an 
arbitrary distance unless the object is hidden by a wall. The problem asks to deploy the smallest 
number of art gallery employees in such a way that there does not exist any area of the gallery that 
is not observed by at least one employee. In many security applications, as well as in entertainment 
applications, visual information is of the ultimate importance. Therefore, in the last 5 years, visual 
coverage emerged as one of the most popular topics. There are several surveys that treat this 
important problem in great detail [19,85]. 

In addition, there is a survey by Georgia Institute of Technology researchers that covers multi- 
media wireless sensor networks that is concerned with both data acquisition and data transmission 
[4]. The main conceptual difference between the standard definition of coverage and visual coverage 
is that cameras are subject to directional field of view and that they have rather large but never- 
theless limited sensing range. A very important assumption is about the ability to rotate camera 
as required by tracking or coverage needs. As a consequence of these intricate sensor models, very 
intriguing and challenging optimization problems arise. It is surprising that a significant number 


476 m Intelligent Sensor Networks 


of these can be solved in provably optimal ways using polynomial time complexity algorithms 
[2,3,14,39,42,78,83,97,101]. 


21.5.8 Security 


Security is one of the most important parameters in many mobile and unattended system. In 
addition to papers published at many wireless, sensor, and security conferences, even dedicated 
conferences for wireless security attract a large number of submissions. Essentially, all security 
issues related to system security directly apply to coverage in sensor networks. It is not surprising 
that security of coverage results is of high importance. After all, coverage problems are very often 
directly related themselves to security applications. There are a large number of surveys on security 
in sensor networks [22,28,76]. 

In addition, there are at least two security challenges that are specific for sensor networks 
and coverage. The first is issue of physical attacks. Usually, security attacks require sophisticated 
mathematical, software, or system techniques. Therefore, it can be undertaken only by experts 
in these fields and significant efforts. However, reading of sensors can be easily altered using 
corresponding source of excitation. For example, one can easily increase the temperature of a sensor 
or alter the speed of acoustic signal propagation using dust. These type of attacks can easily result in 
greatly incorrect distance, location, or other measurements [23]. The development of techniques 
that mitigate or even better eliminate such impacts are of high importance. 

The second issue is that in addition to the correct measurements one need to ensure that each 
of the measurements is collected by a sensor deployed by trusted party at exact location where the 
sensor is initially deployed at exactly the time when it claimed that data are collected. Recently, 
several such solutions that utilize the notion of public physical unclonable function (PPUF) [12] 
have been developed [67,77]. The key idea is to combine challenges and/or GPS as inputs to one 
or more PPUFs. The characteristics of PPUF are such that any attempt to separate or replace them 
destroys their characteristics and therefore security properties. 


21.5.9 Emerging Directions 


Initial efforts on coverage in sensor networks have formulated and solved several canonical problems. 
There are exponentially many new formulations that consider more and more issues or accept more 
complex and detailed sensing models as well as object movement. While many of them are 
interesting and technically challenging, there is still an ongoing search for killer applications of 
large and profound practical importance. Also, several basic problems such as static coverage with 
respect to static objects are still not completely answered. 

There are too many new applications for any survey or even book to cover. Due to space 
limitations, we just very briefly go through two new applications: mobile wireless health [46] and 
energy harvesting [66,93-95]. In addition, we briefly discuss the related and intriguing emerging 
topic of local sensing using global sensors [79,80]. 

We illustrate issues in coverage problems using a very small crosscut of wireless health research, 
specifically, medical shoes. Medical shoes are instrumented with a large number of sensors that 
record pressure below each small area of a soul and several other types of sensors (e.g., accelerators) 
[74,75]. These remarkably simple systems are capable of facilitating remarkable broad sets of 
diagnoses and of supporting a wide spectrum of medical treatments. However, these systems are 
rather expensive and have high energy budgets. It has been recently demonstrated that both can be 
reduced by more than an order of magnitude by using the notion of semantic coverage. Semantic 
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coverage does not detect all events, but only ones that are relevant for medical purposes [93-95]. 
Therefore, in a sense it provides a natural bridge between coverage and general sensor fusion that 
is driven by applications. 

We use the term “global sensors” for large sensors that simultaneously sense multiple locations. 
Probably the best illustration is one where a single sensor is used to sense pressure from any of k 
keyboards. At first this approach to coverage of events (one where any single key ofa keyboard senses 
pressure) may sound counterintuitive. However, it results in great energy sensing. For example, if 
we just want to detect if any key is activated when we have standard one key—one sensor scheme, 
we need as many sensor readings as there are keys. However, if each sensor covers k keys, this 
requirement is reduced by a factor of & times. Judicious placement of such global sensors can ensure 
complete coverage of keys while reducing energy requirements by more than an order of magnitude 
[79,80]. Although the first algorithms have been proposed and they are very effective, we still know 
rather little of advantages and limitations of the use of global sensors for local sensing. 


21.6 Conclusion 


We have surveyed the history, state of the art, and trends of coverage in sensor networks. Since 
comprehensive and complete coverage is out of the question due to the tremendous amount of 
research, we placed emphasis on the most important conceptual and practical issues. Even then, 
only a small slice of research results are covered. Nevertheless, we hope that this chapter will help 
practitioners and facilitate starting research in obtaining a better global picture of coverage in sensor 
networks. 
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Distributed solutions for signal processing techniques are important for establishing large-scale 
monitoring and control applications. They enable the deployment of scalable sensor networks for 
particular application areas. Typically, such networks consists of a large number of vulnerable 
components connected via unreliable communication links and are sometimes deployed in harsh 
environment. Therefore, dependability of sensor network is a challenging problem. An efficient and 
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cost-effective answer to this challenge is provided by employing runtime reconfiguration techniques 
that assure the integrity of the desired signal processing functionalities. Runtime reconfigurability 
has a thorough impact on system design, implementation, testing/validation, and deployment. The 
presented research focuses on the widespread signal processing method known as state estimation 
with Kalman filtering in particular. To that extent, a number of distributed state estimation 
solutions that are suitable for networked systems in general are overviewed, after which robustness 
of the system is improved according to various runtime reconfiguration techniques. 


22.1 Introduction 


Many people in our society manage their daily activities based on knowledge and information 
about, for example, weather conditions, traffic jams, pollution levels, oil reservoirs, and energy 
consumptions. Sensor measurements are the main source of information when monitoring these 
surrounding processes. Moreover, a trend is to increase the amount of sensors, as they have become 
smaller, cheaper, and easier to use, so that large-area processes can be monitored with a higher 
accuracies. To that end, sensors are embedded in a communication network creating a so-called 
sensor network, which typically consists of sensor nodes linked via a particular network topology 
(Figure 22.1). Each sensor node combines multiple sensors, a central processing unit (CPU), and a 
(wireless) communication radio on a circuit board. Sensor networks have three attractive properties 
for system design: they require low maintenance, create “on-the-fly” (ad hoc) communication 
networks, and can maintain large amounts of sensors. 

Nowadays, sensor nodes are commercial off-the-shelf products and give system designers new 
opportunities for acquiring measurements. Although they make sensor measurements available 
in large quantities, solutions for processing these measurements automatically are hampered by 
limitations in the available resources, such as energy, communication, and computation. 

Energy plays an important role in remotely located processes. Such processes are typically 
observed by severely energy-limited sensor nodes (e.g., powered by battery or energy scavenging) 
that are not easily accessible and thus should have a long lifetime. Some applications even deploy 
sensor nodes in the asphalt of a road to monitor traffic or in the forest to collect information on 
habitats. See, for example, the applications described in [1,2] and recent surveys on sensor networks 
in [3-6]. To limit energy consumption, one often aims to minimize the usage of communication 
and computational resources in sensor nodes. However, there are other reasons why these latter 
two resources should be used wisely. 

Limited communication mainly results from upper bounds on the network capacity, as it 
was established in the Shannon—Hartley theorem for communication channels presented in [7]. 


Mesh topology Star topology Sensor nodes 


O 
eo 
O 


Figure 22.1 Sensor nodes in a mesh and star network topology with some examples of nodes: 
Tmote-Sky (top-left), G-node (bottom-left), and Waspmote (right). 


Self-Organizing Distributed State Estimators m 485 


It shows that the environment in which nodes communicate influences the amount of data that 
can be exchanged without errors. In addition, communication is affected by package loss as well, 
which occurs due to message collision (i.e., simultaneous use of the same communication channel 
by multiple transmitters). Hence, a suitable strategy for exchanging data is of importance to cope 
with the dynamic availability of communication resources. 

Computational demand is related to the algorithms performed in sensor networks for processing 
the measurements. The established centralized solutions, where measurements are processed by a 
single node, fail for large-scale networks even when communication is not an issue: With an 
increasing amount of sensor nodes, the computational load of a centralized solution will grow 
polynomially, up to a point that it is no longer feasible or highly inefficient. To that extent, 
non-centralized solutions are explored that aim to make use of local CPUs that are already present 
in each node. 

A straightforward consequence of the resource limitation, the scale, and the often-hostile 
embedding environment is that fault-tolerance and/or graceful degradation are critical requirements 
for large-scale distributed systems. This means that the sensor network should be able to cope with 
situations that emerge from common operational events, such as node failures, sensor degradation, 
and power loss. Building in redundancy to cover the anticipated failure modes may result in 
complex, prohibitively expensive implementations. Instead, dynamical system architectures are 
to be realized via runtime reconfiguration, as it realizes a networked system that can follow the 
changes in the internal and external operational conditions and assure optimal use of available 
resources. 

Limitations of the earlier-mentioned resources are important design parameters. Depending 
on the sensor network application at hand, suitable trade-offs must be made to enable a feasible 
and practical deployment. One of these trade-offs is the local processing-communication trade- 
off. This encourages the local processing of the sensor measurements rather than communicating 
them, since exchanging 1 bit typically consumes much more energy than processing 1 bit. Hence, 
centralized methods for processing measurements are unpractical, due to their significant impact 
on the communication requirements. To solve this issue, distributed signal processing methods 
are increasingly studied. Such methods seek for a more efficient use of the spatially distributed 
computation and sensing resources according to the network topology. The signal processing 
method addressed in this chapter is state estimation. 

Well-studied state estimation methods are the Kalman filter (KF) for linear processes, with 
extensions known as the extended KF and unscented KF for nonlinear processes, see, for example, 
[8-10]. Apart from their centralized solutions, some distributed implementation are found in 
[11-19]. Typically, these distributed solutions perform a state estimation algorithm locally in each 
node and thereby compute a local estimate of the global state vector. Note that these distributed 
solutions can thus be regarded as a network of state estimators. However, they were not designed 
to cope with the unforeseen operational events that will be present in the system, nor address 
deliberate reconfigurations of a sensor network during operation.* 

Therefore, the contribution of this chapter is to integrate solutions on distributed Kalman 
filtering with a framework of self-organization. To that extent, each node not only employs a state 
estimator locally but additionally performs a management procedure that supports the network 
of state estimators to establish self-organization. The outline of this chapter is as follows. First, 
we address the used notation, followed by a problem description in Section 22.3. Section 22.4 


* For example, a reduction of the sampling time of nodes that run out of battery power, so to save energy and 
increase their lifetime. 
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then presents several existing solutions on distributed Kalman filtering, with its required resources 
in Section 22.5, for which a supportive management procedure is designed in Section 22.6. The 
proposed network of self-organizing state estimators is further analyzed in Section 22.7 in an 
illustrative example, while concluding remarks are summarized in Section 22.8. 


22.2 Notation and Preliminaries 


R, Ry, Z, and Z4 define the set of real numbers, nonnegative real numbers, integer numbers, 
and nonnegative integer numbers, respectively. For any C C R, let Ze := ZNC. The notation 0 is 
used to denote zero, the null-vector, or the null-matrix of appropriate dimensions. The transpose, 
inverse (when it exists), and determinant of a matrix A € R”*” are denoted by AT, AT! and 
14], respectively. Further, {A},, € R denotes the element in the qth row and rth column of A. 
Given that A, B € R”*” are positive definite, denoted by A > 0 and B > 0, then A > B denotes 
A-—B > 0. A > 0 denotes that A is positive semi-definite. For any A > 0, A2 denotes its 
Cholesky decomposition and AT2 denotes AL. The Gaussian function (Gaussian in short) 
of vectors x, u € R” and matrix 2 e R”*” is denoted by G(x, u, 2), for which 2 > 0 holds. 
Any Gaussian function G(x, u, 2) can be illustrated by its corresponding ellipsoidal sub-level-set 
En y := {x € R”|(u — x)! EN Uu-x) < 1). See, Figure 22.2 for a graphical explanation of a 
sub-level-set. 


22.3 Problem Formulation 


Let us consider a linear process that is observed by a sensor network with the following description. 


Networked System The network consists of N sensor nodes, in which a node i € Nis identified 
by a unique number within N := Zi]. The set N; C N is defined as the collection of all 
nodes j € N that have a direct network connection with node i, that is, node i exchanges 
data with node j. 

Process Each node i € N observes a perturbed, dynamical process according to its local sampling 
time T; € Ryo. Therefore, the discrete-time process model of node i, at the k;th sampling 
instant, yields 


x[k = Ar,<[k;—1] + wfk;—1], 


(22.1) 
Ik = Cix[k;] + vk. 


The state vector and local measurement are denoted as x € R” and y; € R”, respectively, 
while process-noise w € R” and measurement-noise v; € R”” follow the Gaussian distributions 


(x) 
\ sO? (0-1 TX Up) <1 


> {x} 


Figure 22.2 An illustrative interpretation of the sub-level-set €,, y. 
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p(w[k;1) := G(wI[k;], 0, Qr,) and pwilki]) := G(vilki],0, Vi), for some Qr, € R”*” 
and V; e R”*”, A method to compute the model parameters Ar, and Qx, from the 
corresponding continuous-time process model x = Fx + w is the following; 


Ar =e and Qa = Br,cov(w(t—7,)) BL, 


1 
Ti 


with Br, := edn. 


The goal of the sensor network is to compute a local estimate of the global state x in each 
node i. Note that the process model is linear and both noises are Gaussian distributed. As such, it 
is appropriate to assume that the local estimate is Gaussian distributed as well, that is, p;(x[k;]) := 
G(x[k,], x,[k,], P;[k;]) for some mean x;[k;] e R” and error-covariance P;[k;] e R”*”. This further 
implies that one can adopt a distributed KF solution in the sensor network for state estimation, for 
example, [11,13-19]. Such solutions typically compute a local estimate of x in each node i based 
on y; and on the data exchanged by its neighboring nodes j € N;. Existing methods on distributed 
Kalman filtering present an a priori solution on what data should be exchanged, at what time, and 
with which nodes. Hence, for a given sensor network, a matched (static) estimation procedure 
is derived per node under predefined conditions. Such static estimation procedures are infeasible 
when deploying large-scale networked systems. Broken communication links, newly added nodes 
to an existing network, node failures, and depleted batteries are just a few examples of operational 
events likely to occur in large-scale sensor networks. Solutions should thus be in place that enables 
the (data processing) sensor network to cope with these configuration changes by reconfiguring 
its own operation in runtime. These topics are often addressed by methods that establish a self- 
organizing network, in which a feasible solution for unforeseen system changes is sought for during 
the operation of a network rather than during its design time. 

Therefore, this chapter investigates a self-organization sensor network with the purpose of 
estimating the state vector of large-area processes (Figure 22.3). More specifically, the problem 
addressed is to integrate state-of-the-art results in distributed Kalman filtering with applicable 
solutions for establishing a self-organizing networked system. The (modified) Kalman filtering 
algorithms performed in the different nodes interact with each other via a management layer 
“wrapped around” the KF. The management layer is responsible for parameterization and topology 
control, thus assuring coherent operational conditions for its corresponding estimator. Note that 
this warrants a two-way interaction between the modified KF and the management layer. Let us 
present the state of the art in distributed Kalman filtering, next, before addressing the solutions 
that establish a self-organizing networked system. 


O) ON = Communication (1 and 2 way) 
> 
N 


Ox | Y O Modified kalman filter 


LG 
O) (O) Management layer 


Figure 22.3 A network of Kalman filters with supporting management layer to realize the 
self-organizing property of the network. 
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22.4 Distributed Kalman Filtering 


The linear process model of (22.1) is characterized by Gaussian noise distributions. A well-known 
state estimator for linear processes with Gaussian noise distributions is the KF, formally introduced 
in [9]. Since many distributed implementations of the KF make use of its original algorithm, let us 
define the Kalman filtering function fp : R” x R”*” x R”*” x R”*” xR” x R”*” x RP” > 
R” x R”*”. Different nodes will employ this function. Therefore, let us present a generalized 
characterization of fr independent of the node index i. To that end, let y[k] € R” denote a 
measurement sampled at the synchronous sampling instants k € Z+ with a sampling time of 
TER, according to the following description: 


Jik] = Cx[k] + ofk],  p([k]) = G(o[k], 0, V). (22.2) 


Then, a characterization of the Kalman filtering function, which computes updated values of the 
state estimates x[k] and P[k] based on y[k] in (22.2), yields 


with M = A,P[k—1]A! + Qu; 


K = MC' CMC" + Vy!; 
(22.4) 
2[k] = Arâå[k—1] + K Gtk] — CA-2[k—1]); 


PIk] = (I, — KC)M. 


The KF is a successful and well-studied state estimator. See, for example, some assessments presented 
in [20-22]. Its success is based on three aspects: 


m Measurements are included iteratively. 

m The estimation error x — x is asymptotically unbiased and attains the minimal quadratic 
value of the error-covariance P. 

m The Kalman filtering algorithm is computationally tractable. 


Therefore, when distributed solutions for state estimation became apparent, the Kalman filtering 
strategy was often the starting point for any novel distributed state estimator. Moreover, many 
of the ideas explored in distributed Kalman filtering are easily extendable toward distributed state 
estimation in general. A summary of these ideas is given in the next sections, as it facilitates in the 
decision on how to compute a node’s local estimate p;(x). 

The overview on distributed Kalman filtering distinguishes two different approaches. In the first 
approach, nodes exchange their local measurement, while in the second approach nodes share their 
local estimate (possibly additional to exchanging local measurements). This second approach was 
proposed in recent solutions on distributed Kalman filtering, as it further improves the estimation 
results in the network. For clarity of exposition, solutions are initially presented with synchronized 
sampling instants k € Z4, that is, each node i has the same sampling instant T € Ry. After 
that, modifications are given to accommodate asynchronous sampling instants k; € Z4 and local 
sampling times T; € R+. 
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22.4.1 Exchange Local Measurements 
22.4.1.1 Synchronized Sampling Instants 


First solutions on distributed KFs proposed to share local measurements. See, for example, the 
methods presented in [11,23-25]. Local measurements are often assumed to be independent 
(uncorrelated). Therefore, they are easily merged with any existing estimate in a particular node. 
To reduce complexity even further, most methods do not exchange the actual measurement but 
rewrite y; C;, and V; into an information form, that is, 


zilk] := C! VI yk] and ZIk]:= CIVIC, VieN. (22.5) 
1 1 y E 1 


Established terms for z;[k] € R” and Z;[k] e R”*” are the information vector and information 
matrix, respectively. They are used in an alternative KF algorithm with equivalent estimation 
results but different computational complexity, known as the information filter. To that extent, let 
us introduce the information filtering function fip : R’ x R”*”x R”*”x R”*”” x R”x R”*” > 
R” x R”*”, for z[k] := C'V—!y[k] and Z[k] := C'V—!C as the information form of the 


generalized measurement y[k] expressed in (22.2), that is, 
(x[k], PIK) =fte@[k—1], P[k—1], Ax, Qr, z[k], ZII), (22.6) 
with M = A,P[k—1]A! + Qu; 
Pik] = (M7 + Z[k)7}; (22.7) 
S[k] = PIKI(M A k—1] + z[k). 


Notice that a node 7 can choose between fxr and fir for computing a local estimate of x. 
This choice depends on the format in which nodes share their local measurement information, 
that is, the normal form (y;, C;, V;) or the information form (z;, Z;), as well as the computational 
requirements of fkr and fir. In addition, note that when the original KF is employed by a node 
i, that is, (SKI, Pik) = fxr (%[k-1], PAk— 1], Ax, Qu, ilk], Ci, Vj), then y¡[k] is constructed 
by stacking y;[k] with the received ylk] column wise,* for all j € M;. However, the distributed 
KF proposed in [11] showed that the administration required to construct J;, C;, and V; can be 
simplified into an addition when local measurements are exchanged in their information form 
instead. This implies that each node i performs the following function, which is also schematically 
depicted in Figure 22.4, that is, 


GEk], Pk) = fir @ilk—1, Pilk—1], Ar, Qr, z;[k], Ž;[k]), 


with z[k] =z;[k] +9 zik] and Zk] = Zk] +) ZIkI. (22.8) 
JEN; JEN; 


This simple, yet effective, distributed KF triggered many novel extensions. For example, to 
reduce communication requirements by quantization of the measurement values, as presented 
in [17], or to estimate only a part of global state vector x in a node ¿, for example, [26,27]. 
However, a drawback when exchanging measurements is that node ¿ receives localized data from 


* Parameters C; and V; can be constructed similar to Yi. 
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Figure 22.4 Schematic setup of a node’s local algorithm for estimating the state x according to 
a distributed KF where local measurements are exchanged in their information form. 


the neighboring nodes j € N;. Hence, only a part of the measurements produced by the sensor 
network is used for computing x; and P;. A solution to exploit more measurement information, 
as proposed in [14,15], is to attain a consensus on local measurements. This means that, before 
fir is performed, each node j first employs a distributed consensus algorithm on z,[k] and Z;[k], 
for all i € N. Some popular consensus algorithms are found in [28-31]. However, they require 
that neighboring nodes exchange data multiple times in between two sampling instants. Due to 
this demanding requirement, distributed KFs with a consensus on local measurements are not 
very popular. Other extensions of the distributed KF presented in (22.8) take into account that 
the sampling instants of individual nodes can differ throughout the network. As this is also the 
case for the considered network, let us discuss the extension for asynchronous measurements 
next. 


22.4.1.2 Asynchronous Sampling Instants 


The assumed sensor network of Section 22.3 has different sampling instants per node. This means 
that the k;th sample of node i, which corresponds to its local sampling instant 4, € R+, will 
probably not be equal to the time + € R at which a neighboring node j € M; sends TOR Z;(t)). 
To address this issue, let us assume that node 7 received (z), ZW) at time instant £ € Rara]: 
Then, this received measurement information is first “predicted” toward the local sampling instant 
tk,» so that it can be used when node i runs its local estimation function fig. The results of [32] 
characterize such a prediction, for all j € N; and 1 € Riy, _1,4) as follows: 


IDEO) 


i 


kile :=(4,)3(0+( TETA 


-0f pt0 + UA 30), (22.9) 


J 
Zlkill = z+ rl p+ a n= ( + Gy) t 

in which p, := Aa A op z= A WO and := p+ z Further, 
note that a node j € N; may have send multiple data packages in between ¢,_; and 4, with 
local measurement information, for example, when node j has a smaller sampling time than 
node i. 

The (predicted) measurement of (22.9) in information form can directly be used by an informa- 
tion filter. This means that the values of x;[k;] and P;[k;] are updated at the local sampling instant 
tx, of node i according to an algorithm that is similar to the one presented in (22.8), that is, 
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(Sk, PAK) = fir (xi[k;—1], Pilk;—1], Az; Qr; z[k;], Z[k,D), 


with z[k] = zilk] +50 zkilel, Wee Ra, ang) 
JEN; (22.10) 


ZIkA = Zik] +% Ziki], Ve e Royi] 
jeN; 


Note that the earlier-mentioned information filter assumes that local measurement are exchanged 
in the information form. A solution when nodes exchange local measurements in their normal 
form, that is, (y; C;, V;), is to employ the Kalman filtering function fkr for each time instant 
t € Riia] at which a new measurement is received. Such a procedure could reduce the 
computational demands ofa node, since the prediction formulas of (22.9) are complex. Nonetheless, 
incorporation of local measurements y;(t) that are not sampled at the predefined sampling instants 
tx, requires much attention from the management layer of the individual node i. A more natural 
solution to this problem is obtained in distributed KFs that exchange local estimates instead of 
local measurements, which are presented next. 


22.4.2 Exchange Local Estimates 
22.4.2.1 Synchronous Sampling Instants 


The main advantage of exchanging local estimates is that measurement information spreads through 
the entire network, even under the condition that nodes exchange data only once per sampling 
instant. However, since local estimation results are exchanged, note that nodes require a method 
that can merge multiple estimates of the same state x into a single estimate. Various solutions of 
such methods are found in literature. However, before addressing these methods, let us start by 
presenting the generalized estimation algorithm performed by each node i that corresponds to this 
type of distributed KF solutions. 

Typically, solutions of distributed KF that exchange local estimates first merge the local mea- 
surement y;[k] with the previous local estimate p;(x[k—1]) via a KF and thereby, compute the 
updated estimate p;(x[k]). This updated local estimate is then shared with neighboring nodes, due 
to which node i will receive the local estimate of nodes € N;. It will be shown that not every solu- 
tion requires to share both the locally estimated mean as well as its corresponding error-covariance. 
Therefore, let us introduce set of received means at node i as A; C R” and a corresponding set of 
received error-covariances as P; C R”*”, that is, 


X[k] := {%[k] € R”Y € Ni}, (22.11) 
PAK] := (PAkI e RY"lj € Nj}. (22.12) 


The earlier-mentioned information of the local estimation results at neighboring nodes, together 
with the node's own local estimate, that is, x;[k] and P;[k], will be used as input to a merging 
function. More precisely, let us introduce this merging function Q : R” x R”*” x R” x R”*” > 
R” x R”*”, which results in the merged Gaussian estimate p;+(x[k]) := G(x[k], x;+[k], P[k], 
as follows: 


Ga [k], PIKI = Q GEk, Pik], Vik], Pk]. (22.13) 
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Figure 22.5 Schematic setup of a node’s local algorithm for estimating the state x according to 
a distributed KF where local estimates are exchanged. 


Then, the generalized local algorithm performed by a node i € N for estimating the state, which 
is also depicted in the schematic setup of Figure 22.5, yields 


(x;[k], P,[k]) = fu (+ [k—1], P [k—1], Ar, Qu, yi[k], Ci; Vi); 
share (<;[K], P;[k]) with all j € Ni; 


A (22.14) 
collect (&1[k], Pi[k]) for all j € Ni; 
(xj+[k], PID = 2(%[k], P,[k], Xk], Pilk]). 
Note that a suitable strategy for the merging function Q(-,-,-,-) is yet to be determined. 


Literature indicates that one can choose between three types of strategies—consensus, fusion, and 
a combination of the two. A detailed account on these three strategies is presented next, by starting 
with consensus. 

Consensus strategies aim to reduce conflicting results of the locally estimated means x;, for 
all ¿e N. Such an objective makes sense, as X; in the different nodes i of the network is a local 
representative of the same global state x. Many distributed algorithms for attaining a consensus (or 
the average) were proposed, which all aim to diminish the difference X;[k] — x;[k], for any two 
i,j € N. See, for example, the distributed consensus methods proposed in [28-31]. The general 
idea is to perform a weighted averaging cycle in each node î on the local and neighboring means. 
To that extent, let Wj, € R”*”, for all e N;, denote some weighting matrices. Then, a consensus 
merging function Q2(-, -, -, +) is typically characterized as follows: 


(xj+[k], Pa [k]) = Q Gk], Pik], 4[kJ, Pk], 
with 0k] = (= Walk] + IE Will, pais 
3[kJe%1K] zlk]eXk] 
P[k] = Pik]. 
Note that the previously mentioned consensus merging function is limited to the means and that 
the error-covariance of a node is not updated, due to which P;[k] can be the empty set. Further, 


most research on consensus methods concentrates on finding suitable values for the weights Wj, 
for all j € N;. Some typical examples of scalar weights were proposed in [28,31], where d; := IN; 


(number of elements within the set NG) and € < min(d;,..., dy), that is, 
Nearest neighboring weights Wj := (1 — dy, Vi e Ni; 
Maximum degree weights Wi := (1 — er 4 Vi e Ni; 


Metropolis weights Wy := (1 + max{d;, dy)” , WEN. 
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An analysis on the effects of these weights, when they are employed by the consensus function 
in (22.15), was presented in [28,31]. Therein, it was shown that employing nearest neighboring 
weights in (22.15) results in a bias on limi, 45 x;+[k]. This is prevented by employing maximum 
degree weights or metropolis weights. However, maximum degree weights require global information 
to establish € in every node, which reduces its applicability in sensor networks. 

Employing a consensus strategies for merging the local estimates of neighboring nodes is very 
popular in distributed KFs. As a result, many extensions of the preceding solution are found in 
literature. A common extension is to perform the averaging cycle not only on the means x;[k] and 
X[k], as characterized in (22.15), but also on the error-covariances P;[k] and P[k] of neighboring 
nodes. See, for example, the distributed KF proposed in [33] and a related solution presented in 
[34]. It is worth to point out that an in-depth study on distributed KFs with a consensus on local 
estimates is presented in [35]. Therein, it is shown that minimization of the estimation error by 
jointly optimizing the Kalman gain K of fxp and the weights Wj of Q is a non-convex problem. 
Hence, choosing the value of the Kalman gain K affects the weights Wj, for all j € A, which 
raised new challenges. A solution for joint optimization on K and Wy was introduced in [36] as 
the distributed consensus information filter. However, a drawback of any consensus method is that 
the local error-covariance P;[k] is not taken into account when deriving the weights Wy, for all 
j € Ni. The error-covariance is an important variable that represents a model for the estimation 
error cov(x[k] — x;[k]). Therefore, merging two local estimates p;(x[k]) and p;(x[k]) in line with 
their individual error-covariance implies that one can choose the value of Wj such that the result 
after merging, that is, p;+(x[k]), is mainly based on the local estimate with the least estimation 
error. This idea is in fact the fundamental difference between a consensus approach and a fusion 
strategy. In fusion, both error-covariances P;[k] and Pik] are explicitly taken into account when 
merging p;(x[k]) and pj (xtk)), as it is indicated in the next alternative merging function based 
on fusion. 

Fusion-consensus strategies is a label for characterizing some initial fusion solutions that are 
based on the fusion strategy covariance intersection, which was introduced in [37]. Fusion strategies 
typically define an algorithm to merge two prior estimates p;(x[k]) and p;(x[k]) into a single, 
“fused” estimate. Some fundamental fusion methods presented in [25,38] require that correlation 
of the two prior estimates is available. In (self-organizing) sensor networks, one cannot impose such 
a requirement, as it amounts to keeping track of shared data between all nodes in the network. 
Therefore, this overview considers fusion methods that can cope with unknown correlations. A 
popular fusion method for unknown correlations is covariance intersection. The reason that this 
method is referred to as a fusion-consensus strategy is because the fusion formula of covariance 
intersection is similar to the averaging cycle of (22.15) in consensus approaches. The method 
characterizes the fused estimate as a convex combination of the two prior ones. As an example, let 
us assume that node ¿ has only one neighboring node j. Then employment of covariance intersection 
to characterize Q (-, -, -, -) of (22.14) as a fusion function, for some Wj € Rjo,1), yields 


Paik] = (U — 92 Tk] + WP Tk), 


Kee [k] = PIKO — Wp) Pr Tek] + WyP; “TKI; Ik). 


Note that the preceding formulas indicate that the error-covariance P;[k] and Pik] are explicitly 
taken into account when merging X;[k] and X[k]. Moreover, even the weight Wj; is typically 
based on these error-covariances, for example, Wj = tr(Pj[k])(tr(Pi[k]) + (Puy)! with 
some other examples found in [39-41]. As a result, the updated estimate p;+ (x[k]) computed by 
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the merging function Q will be closer to the prior estimate p;(x[k]) or pj(xtk]) that is “the most 
accurate one,” that is, with a smaller error-covariance. An illustrative example of this property will 
be given later on. For now, let us continue with the merging function in case node i has more than 
one neighboring node. Fusion of multiple estimates can be conducted recursively according to the 
order of arrival at a node. Therefore, the merging function Q(-,-,-,-) based on fusion method 
covariance intersection has the following characterization: 


[k], PIK = 2(%[K], Piik], Vik], Pik), 


with: for each received estimate (xj[k], P;[k]), do 


Ej = (Q — WP Ik + WP Ik)’; 


zilk] = Ea = Wy) P7 'Ik]â;[k] + W,P kk); (22.16) 
Pilk] = £; 
end for 


ŝa [k] = [k], PIK] = P[k]. 


Although covariance intersection takes the exchanged error-covariances into account when merging 
multiple estimates, it still introduces conservatism. Intuitively, one would expect that p;+(x[k]) is 
more accurate than p;(x[k]) and pj(<[k)), for all j € N;, as prior estimates of neighboring nodes 
are merged. A formalization of this intuition is that P;+[k] < P;[k] and P;+[k] < P[k] should 
hold for all j € N;. One can prove that covariance intersection does not satisfy this property, due to 
which an alternative fusion method is presented next. 

Fusion strategies aim to improve the accuracy after fusion, for which the basic fusion problem 
is the same as previously mentioned, that is, merge two prior estimates p;(x[k]) and pj(xtk]) 
into a single, “fused” estimate p;+(x[k]), when correlations are unknown. Some existing fusion 
methods are found in [42-44]. In this survey the ellipsoidal intersection fusion method of [42,43] 
is presented, since it results in algebraic expressions of the fusion formulas. In brief, ellipsoidal 
intersection derives an explicit characterization of the (unknown) correlation a priori to deriving 
algebraic fusion formulas that are based on the independent parts of p;(x[k]) and p;(xtk)). This 
characterization of the correlation, for any two prior estimates p;(x[k]) and pj(xIk]), is represented 
by the mutual covariance Ty € R”%” and the mutual mean Yi € R”. Before algebraic expressions 
of these variables are given, let us first present the resulting merging function Q(-,-,-,-) when 
ellipsoidal intersection is employed in this function for fusion: 


(+ [kJ], Pt [k]) = Q Eik], P,[k], Vk], P¿[kD, 
with: for each received estimate (ŝ;[k], P;[k]), do 


2; = (Pek) +2 1-17; 


i(k] = E:(P7 kkk] + 27 Ik Ik — Py Ya); (22.17) 
Piik] = Ba 
end for 


£a [k] = xi[k], P[k] = Pik]. 
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The mutual mean yj and mutual covariance I; are found by a singular value decompo- 
sition, which is denoted as (S,D,S71) = sud(E) for a positive definite 5 € R’*”, a 
diagonal D € R”*” and a rotation matrix S € R”*”. As such, let us introduce the matri- 
ces Dj, Dj, Si, 5; € R”%” via the singular value decompositions (S;, Dj, S7!) = svd(P;[k]) and 


= -l 
(S; Di, S7’) = svd (D; ? Sz! PAKIS;D; *). Then, an algebraic expression of Yj and Tj, for some 
ç € R4 while (4), € R denotes the element of a matrix A on the qth row and rth column, yields 


Dr, = diag eZ; ( max[ 1, Dihal) 


1 1 
7 —1 pž ¢—1 
y= SD SDS D 


yy = (P! + Pp) = 207! + 251)! 


O) 


A suitable value of ç follows: ç = 0 if |1 — (D¡Jygl > 10€, for all g € Zi, and some e € Ro, 
while ¢ = € otherwise. The design parameter € supports a numerically stable result of ellipsoidal 
intersection. 

This completes the three alternatives that can be employed by the merging function Q(-,-,+, +). 
Before continuing with an extension of this merging function toward asynchronous sampling 
instants, let us first present an illustrative comparison of the two fundamentally different approaches. 
An illustration of this comparison is depicted in Figure 22.6, which is established when p;(x[k]) and 
Pi (x[k]) are either the result of a fusion or a consensus approach. The consensus result is computed 
with the averaging cycle of (22.15) and Wj = 0.1. Recall that only the means x;[k] and x;[k] 
are synchronized and not their error-covariances. The fusion result is computed with ellipsoidal 
intersection of (22.17). Let us further point out that Figure 22.6 is not included to decide which 
method is better. It is merely an example to illustrate the goal of consensus (reduce conflicting 
results) with respect to the goal of fusion (reduce uncertainty). 


3 3 
EN = = = pla) 
DE An pix) 1 p DF lamaa B(x) 
— pit (x) i E — p;+ (x 
1 i i 1 
Eusion y Consensus 
0 N 0 
=| -1 
2 = 2 nt 
-1 0 1 2 3 -1 0 1 2 3 


Figure 22.6 A comparison of consensus versus fusion. Note that PDFs are represented as 
ellipsoidal sub-level-set, that is, G(0, u, E) > Ey,s. A graphical characterization of such a sub- 
level-set is found in Figure 22.2, though let us point out that a larger covariance £ implies a 
larger area size of E,, 5. 
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22.4.2.2 Asynchronous Sampling Instants 


The assumed networked system of Section 22.3 has different sampling instants per node. This 
implies that the k;th sample of node i, which corresponds to the sampling instant 4, € Ry, will 
probably not be equal to the time ¢ € R at which a neighboring node j € N; sends GE), P(t). 
Compared to exchanging measurements, asynchronous sampling instants can be addressed more 
easily for distributed KF solutions that exchange local estimates. More precisely, the received 
variables (x;(z), P;(2)) should be predicted from time £ toward the sampling instant ¢,,, that is, 


Silkite] := Ay (0), Vj E€ Nit e Rua 


- | i (22.18) 
Pj[k;l:] = Ay PDA, + + Qu, Vj € Nis rE NC 


Then, solutions of distributed Kalman filtering that are in line with the setup depicted in (22.14) 
can cope with asynchronous sampling instants by redefining A;[k;] and P;[k;] as the collection of 
the preceding predicted means Xlkilz] and error-covariances Py[k;:], for all j e Nj. 

This completes the overview on distributed Kalman filtering, in which nodes can adopt a strategy 
that exchanges local measurements or local estimates. Next, existing self-organization methods are 
presented, though an analysis of the required resources for estimation is studied first. 


22.5 Required Resources 


The distributed KFs presented in the previous section are typically proposed for static sensor 
networks. However, the focus of this chapter is to extent those methods for sensor networks that 
have to deal with changes in the networked system. To cope with these changes, nodes must be 
able to adapt the conditions of their local estimation algorithm, or even choose a local algorithm 
that is based on a different type of distributed KF. In order to carry out these reconfiguration 
processes, certain design decisions should be made in runtime depending on the available resources 
(e.g., how to reassign the KF tasks in case of node failures, what type of KF algorithms are feasible 
to run under given communication constraints, etc.). Therefore, this section presents a summary 
of the required resources for the different distributed KF strategies. Important resources in sensor 
networks are communication and computation. Let us start by addressing the communication 
demand of a node î. Section 22.4 indicates that there are three different types of data packages that 
a node can exchange, that is, the local measurement y; € IR” in normal form or information form, 
and the local estimate of x € R”. The resulting communication demands of node i that correspond 
to these different data packages are listed in Table 22.1. 

Next, let us indicate the computational demand of a node i by presenting the algorithm’s 
complexity of the different functionalities that can be chosen to compute p;(x). This complexity 
involves the number of floating points operations depending on the size of local measurements 
yi € R” and state vector x € R”. To that extent, the following properties on the computational 
complexities of basic matrix computations are used: 


The summation/subtraction of A € R?*" with B € RI*” requires O(qr) operations. 
The product of A € IR7*” times B € R? requires O(qrp) operations. 

The inverse of A € R7*7 invertible matrix requires O(q*) operations. 

The singular value decomposition of A € R7*7 requires O(12q*) operations. 
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Table 22.1 Communication Demand in the 
Amount of Elements (Floating Points) That Is 
Exchanged by Each Node Depending on the 
Data Shared 


Exchanged Data | Communication Demand 
(yi, Ci, Vi) m? + mj + nmi 

(zi, Zi) mtn 

(Xi, Pi) m+n 


Table 22.2 Computational Demand in the Amount of Floating 
Point Operations Depending on the Employed Functionality, 
Where M; := m; + jedi mj 


Functionality Computational Demand 
FKE ~ O(4n* + 3Min2 + 2nM? + M?) 
fig = OGn5 + Min? + nM?) 

Q consensus of (22.15) = OGn + Mj +1) 


Q fusion-consensus of (22.16) | ~ O(3n3 + 9n2) 


Q fusion of (22.17) x O(31n3 + 7n2) 


Then, the resulting computational complexity of the Kalman filtering functions fe and fig and of 
the three merging functions Q, that is, characterized by a consensus, fusion-consensus, and fusion 
strategy, are listed in Table 22.2. 

The next section makes use of Tables 22.1 and 22.2 to decide what type of data should be 
exchanged between neighboring nodes and which functionalities should be followed in the local 
estimation algorithm of a node. 


22.6 Self-Organizing Solutions 


The design challenge of any embedded system is to realize given functionalities, in this case the 
ones of the local estimation algorithm, on a given hardware platform while satisfying a set of 
nonfunctional requirements, such as response times, dependability, power efficiency, etc. Model- 
based design has been proven to be a successful methodology for supporting the system design 
process. Model-based methodologies use multiple models to capture the relevant properties of 
the design (when the required functionalities are mapped onto a given hardware configuration), 
for example, a model of the required functionalities, temporal behavior, power consumption, and 
hardware configuration. These models can then be used for various purposes, such as automatic code 
generation, architecture design, protocol optimization, system evolution, and so on. Important for 
the design process are the interactions between the different models, which can be expressed as 
constraints, dependencies, etc. In this section, a model-based design methodology is followed to 
assure dependability for state estimation in a sensor network via runtime reconfiguration. 

To illustrate the model-guided design process for distributed signal processing let us consider 
an example. Two fundamental models for system design are emphasized here: the task model 
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Task model Physical model 


Py 


T—P mapping 


Proc. properties 
Memory 
Exec. speed 
Energy/instruction 


Interaction properties 
Message size 
Update frequency 


| Task properties 
Memory Comm. properties 
Instruction/cycle Bit-rate 
Update frequency Latency 

| Packet loss prob. 

Energy/bit 


Figure 22.7 Modeling of signal processing and implementation. 


(capturing the required functionalities) and the physical model (capturing the hardware 
configuration of the implementation). For the sake of simplicity, a particular hardware config- 
uration and communication topology is assumed; the question to answer is how the required 
functionalities can be realized on the given configuration, as shown in Figure 22.7. 

The task model in this Figure is represented as directed graph wherein the signal processing com- 
ponents (tasks) are represented by the vertices of the graph, while their data exchange (interactions) 
are represented by the edges. Both the tasks as well as the interactions are characterized by a set of 
properties, which typically reflect nonfunctional requirements or constraints. These properties are 
used to determine system-level characteristics, and thus the feasibility of certain design decisions 
can be tested (see details later). The tasks run on a connected set of processors, represented by the 
physical model of the system. The components of the physical model are the computing nodes, 
that is, consisting of processor, memory, communication and perhaps other resources, and the 
communication links. During the system design, the following steps are carried out (typically it is 
an iterative process with refinement cycles [45], but the iterations are not considered here): 


Select the algorithms for the processing realized by the tasks. 

Compose the task model. 

Select the hardware components for the physical model. 

Select a communication topology. 

Establish the mapping between the task model and the physical model. 


The design process involves a particular mapping that defines the assignment of a task T, to a 
processor P4; that is, it determines which task runs on which node. * Obviously, the memory and 


* We assume that nodes are equipped with a multitasking runtime environment, consequently multiple tasks can 
be assigned to a single node. 
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execution time requirements define constraints when assigning the tasks to nodes. Further, data 
exchange between tasks makes the assignment problem more challenging in distributed configura- 
tions, as a task assignment also defines the use of communication links, and the communication 
links have limited capabilities (indicated by the attached property set in Figure 22.7). After every 
refinement cycle, according to the steps listed earlier, the feasibility of a resulting design should be 
checked. For example, an assignment of T3 to Pz and T4 to Py may yield an unfeasible design if 
the interaction d34 imposes too demanding requirements on the communication link c13, that is, 
high data exchange rate or large data size. On the other hand, assigning both T3 and T4 to P3 may 
violate the processing capability constraint on P3. Changing the hardware configuration and/or 
using less demanding algorithms (and eventually accepting the resulting lower performance) for 
implementing T3 or T4 could be a way out. 

Note that the design process results in a sequence of decisions, which lead to a feasible system 
design. Traditionally, the design process is “offline” (design time), that is, it is completed before 
the implementation and deployment of the system itself. The task model, the hardware configu- 
ration, and their characteristics are assumed to be known during this design time, and the design 
uncertainties are assumed to be low. Under these conditions, a model-based optimization can be 
carried out, delivering an optimal architecture ready for implementation. Unfortunately, these 
assumptions are overly optimistic in a wide spectrum of application cases. 

(Wireless) sensor networks deployed for monitoring large-scale dynamical processes are especially 
vulnerable. Sensor deterioration, node failure, unreliable communication, depleted batteries, etc., 
are not exceptions but common events in normal operation. These events result in changes in the 
system configuration, as it is captured by the physical model, due to which implementations relying 
on static designs may fail to deliver according to the specifications. A possible work-around is to 
build redundancy into the system and thereby, to implement fault-tolerance. In this case, the top- 
level functionalities remain intact until a certain level of “damage” is reached. This approach usually 
leads to complex and expensive implementations—unacceptable for the majority of applications. 
The components are “underutilized” in nominal operation, while power consumption is increased 
due to the built-in redundancy. The other approach is to accept the fact that maintaining a 
static configuration is not feasible and make the system such that it “follows” those changes and 
“adjusts” its internals to assure an implementation of the assigned functionalities as far as it is 
feasible. The resulting behavior typically manifests “graceful degradation” property, that is, until 
damage reaches a certain level the set of functionalities and their quality can be kept; beyond that 
level the system loses noncritical functionalities and/or the quality of running functionalities is 
reduced due to a shortage of resources. Realizing this latter approach has significant impact both 
on system design and on the runtime operation of the system. Conceptually, the system design 
process is not completely finished in design time, instead a set of design alternatives are provided 
for execution. During operation—depending on the health state of the configuration and the 
conditions of the embedding environment—a selection is made automatically to assure an optimal 
use of available resources, that is, providing the highest level of the functionalities under the given 
circumstances. In the next section, typical solutions for implementing this latter approach are 
overviewed. 


22.6.1 Approaches to Runtime Adaptivity 


Evolution of large-scale networked embedded systems in general and (wireless) sensor networks 
in particular poses a number of technical challenges on the design, implementation, testing, 
deployment, and operation processes [46]. Considering the reconfiguration as a “vehicle” to 
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implement such evolution, the reconfiguration of the functionalities on the available hardware can 
be carried out at four different stages of the system's life cycle: 


1. Design time—configuration redesign, new code base, etc. 

2. Load time—new functionalities are implemented via code update. 

3. Initialization time—during system (or component) startup, the optimal design alternative is 
selected and parameterized depending on a “snapshot” of the context. 

4. Runtime—reconfiguration is performed while the system is in use. 


Here, only the runtime reconfiguration variant of the evolution is considered with special emphasis 
on the needs of distributed Kalman filtering. 

In case of runtime reconfiguration, the reconfiguration process is triggered by observation of 
changes in the embedding environment of the system or in the system itself, for example, realizing 
node failure or a low battery status. The “trajectory” for reconfiguration is not predefined but is a 
result of an optimization process attempting to maximize the “usefulness” of the system as defined 
by a performance criterion. The concept of the reconfiguration process is illustrated in Figure 22.8. 

The process relies on the model-based approach as introduced previously. The relevant models 
of the system, such as the task model, physical model, temporal model, etc., are formalized 
and stored in an efficiently accessible way in a database represented by the models block. The 
constraints block represents the dependencies in the models and between models. During operation 
of the signal processing systems, the MONITOR collects information about several aspects of the 
operation. Goals of the operation may change depending on, for example, different user needs. 
Changes in the observed phenomenon may cause that the models assumed in design time have 
become invalid. Similarly, internal changes in the system configuration should be recognized, 
such as broken communication and sensor failure. The MONITOR functionality checks if the 
observed changes result in violating certain constraints of the systems or a significant drop in 
performance. If the MONITOR concludes that under current circumstances the system cannot 
perform as requested, then the reconfiguration process is initiated. The central component is the 


Goals 
environment Violation 
configuration perform. 
—— >| Monitor }/——— Reasoner 
| — — — 
New configuration 
Reconfigurator 
i Reparameterization 


Y rewiring 


Figure 22.8 Reconfiguration process. 
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REASONER, which, based on the models, constraints, and the actual findings, determines a new 
configuration that satisfies all constraints and provides an acceptable performance. It should be 
emphasized the REASONER may carry out not only pure logical reasoning but also other types of 
search and optimization functions depending on the representation used to describe the models, 
goals, and so on. The new configuration is passed to the RECONFIGURATOR functionality to 
plan and execute the sequence of operations for “transforming” the old into the new configuration 
in runtime.* 

Note that the reconfiguration process of Figure 22.8 runs on the same embedded monitoring 
system that is used for signal processing. An efficient implementation of this runtime reconfiguration 
should address three challenges: 


m Representation: What are the right formalisms to describe the models and their interaction? 
To what extent should the models be made part of the running code? What is an efficient 
model representation in runtime? 

m Monitoring: How can we collect coherent information about the health state of the system, 
even in case of failures? How can we deduct the potentially disruptive situations, that is, 
which should trigger reconfiguration actions, from the raw observation set? 

m Reasoning: What are the efficient algorithms, which are matching with the model represen- 
tation, to resolve the conflicts rising from changes in the environment and/or in the system 
configuration? What are the chances for a distributed solution of the reasoning process? 


There are no ultimate answers to these questions. The application domains have crucial impact 
on the optimal representation and reasoning, as well as on the resources that are required to run 
the reconfiguration process itself. Consequently, a thorough analysis of the application in hand, its 
typical failure modes, the dependability requirements, and other relevant aspects of the system in 
its environment jointly identify the proper selection of techniques for setting up a suitable runtime 
reconfiguration process. 

The research area of runtime reconfigurable systems design is quickly evolving. Established 
domains as self-adaptive software systems [47] and dynamically reconfigurable hardware systems 
[48,49] provide fundamental contributions. In the following, a few characteristic approaches are 
briefly addressed. A reconfiguration methodology based on model integrated computing (MIC) 
was introduced in [50]. Therein, the designer describes all relevant aspects of the system as formal 
models. A meta-modeling layer supports the definition of these relevant aspects that are to be 
modeled and generates the necessary model editors, that is, carries out model analysis, verification, 
etc. The program synthesis level consists of a set of model interpreters, which according to the 
supplied models and constraints generate program code. The reconfiguration is triggered by changes 
in the models or constraints, which initiates a new model interpretation cycle. Though MIC 
provides a flexible way to describe and implement reconfigurable systems, the model interpretation 
is a computationally demanding step and may seriously limit the applicability in real-time cases. 
Alternatively, a model-oriented architecture with related tools for runtime reconfigurable systems 
was presented in [51]. This approach uses variability, context, reasoning, and architecture models 


* The operations for “transforming” the configuration act on the program modules implementing the task graph and 
ona “switchboard” realizing the flexible connections among the tasks. Consequently, the program modules should 
implement a “standard” application programming interface (API), which allows for a function independent, 
unified configuration interface to software components. This way parameter changes in the signal processing 
functions and in the connections between these functions can be carried out irrespective of the actual functions 
involved in the processing. 
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to capture the design space. In runtime, the interactions among the event processor, goal-based 
reasoner, aspect model weaver, and the configuration checker/manager components will carry out 
the reconfiguration. The approach is well suited for coping with a high number of artifacts but 
the real-time aspect is not well developed. A formalization of the reconfiguration as a constraint 
satisfaction problem was proposed in [52-54]. The design space is (at least partially) represented 
and its design constraints are explicitly stated. These methodologies implement a “constraint- 
guided” design space exploration to find feasible solutions under the observed circumstances. In 
parallel, a suitable performance criterion is calculated to guide the reconfiguration process to optimal 
solution. The method described in [54] is also capable of hardware/software task migration and 
morphing. Different reconfiguration solutions were developed for service-oriented architectures 
(SOAs). For example, the reconfiguration method introduced in [55] extends the “traditional” 
discover-match-coordinate SOA scheme with a hierarchical service overlay mechanism. This 
service overlay implements a composition functionality that can dynamically “weave” the required 
services from the available service primitives. In [56], a solution is proposed that follows an 
object-centric paradigm to compose the compound services. By modeling the service constraints, 
an underlying constraints satisfaction mechanism implements the dynamic service configuration. 
A different approach was presented in [57], which describes a model-based solution to validate 
at runtime that the sensor network functionalities are performed correctly, despite of changes in 
the operational conditions. It models the application logic, the network topology, and the test 
specification, which are then used to generate diagnostic code automatically. Though the solution 
does not address the REASONING functionality of Figure 22.8, it delivers low false-negative 
detection rates, that is, it covers the MONITOR functionality effectively. 


22.6.2 Implementation of Runtime Reconfiguration 


The runtime reconfiguration brings in an extra aspect of complexity, which is “woven" into the 
functional architecture of the system and thus makes the testing and validation extremely chal- 
lenging. To keep the development efforts on a reasonable level, both design and implementation 
support are needed. Many of the runtime reconfiguration approaches cited propose an architectural 
methodology, design tool set, and runtime support, for example, [46,50,52,54-58]. A common 
feature of these efforts is to support the system developer with application-independent reconfigura- 
tion functionalities, which can be parameterized according to the concrete needs of the application 
at hand. They also attempt to “separate concerns” when feasible, that is, try to make the design 
of the functional architecture and the reconfiguration process as independent as possible, while 
still maintaining clear interactions between them. Typically, the corresponding reconfiguration 
functionalities manifest themselves in an additional software layer between the “nominal” real-time 
executive layer, such as TinyOS [59] or Contiki [60], and the application layer. See Figure 22.9 
for more details. The (application independent) monitoring and reconfiguration functionalities 
in this figure receive the application specific information from “outside” in the form of models. 
Conceptually they are “interpreters.” As such, they realize a virtual machine dedicated to a certain 
type of computational model, for example, rule based inference, finite sate machine, constraint 
satisfaction, and so on. They read-in the application specific “program,” which is represented 
by the reconfiguration rules component in Figure 22.9, and interpret its code in the context of 
the data received from the MONITOR function. For example, if the reconfiguration process is 
based on a rule-based representation of the application specific knowledge,* then the REASONER 


* This type of formalism will be used in the case study later in the chapter. 
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Figure 22.9 Middleware for runtime reconfiguration. 


implements a forward changing (data driven) inference engine [61], in which using the actual 
configuration and the data received from MONITOR as fact base. The inference process results in 
derived events and actions, which define the reconfiguration commands issued for the application 
layer. The application program is characterized by (multi-aspect) models, requirements, and con- 
straints that are created by the designer according to, for example, [57]. For efficiency reasons, 
the models created by a designer are rarely used directly by the reconfiguration process. Instead, 
after thorough compile-time checking, these models are translated to a “machine friendly” format 
to enable resource-aware access and transformations. The models can also be used for automatic 
code generation and synthesis to create the application code if the appropriate tools are available 
[50]. The monitoring functionality of Figure 22.9 (equivalent to the one in Figure 22.8) defines 
the set of observations that a reconfiguration process should take into consideration. Typically, this 
monitoring should cover the operational characteristics of an application, for example, sensor noise 
level and estimator variance, combined with the health state of its execution platform, for example 
battery energy level and quality of communication channels. The reconfiguration rules then define 
the “knowledge base” of the reasoner/reconfigurator, which is also depicted in Figure 22.9, for 
example, they determine how the recognized changes in operational conditions are handled. Note 
that reconfiguration rules do not necessarily refer to rule-based knowledge base but that the format 
and content of the “rules” are determined by a reasoning procedure, for example, constraint satis- 
faction, graph matching, first-order logic, etc. Further note that the application layer of Figure 22.9 
uses a number of reconfigurable components (c1 ... c,) to implement the required application-level 
functionality. These components should implement a unified API, so that the middleware layer 
is able to retrieve information for its monitoring purposes and for executing its reconfiguration 
commands. 
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It should be emphasized that in certain applications the reconfiguration decisions could rely on 
system-wide information. In these cases, the monitoring and reconfiguration activities inherently 
involve communication, resource scheduling, etc. This adds an extra layer of complexity to 
the systems, for example, implementing distributed snap-shot algorithms, leader election and 
distributed reasoning/planning, that may demand resources beyond the capabilities of the nodes. 
A work-around is to give up the fully distributed implementation of the reconfiguration and assign 
the most demanding functionalities to (dedicated) powerful nodes, as proposed in [50,52,54]. The 
monitoring information is then forwarded to the reconfiguration node(s) where a new configuration 
is determined. The reconfiguration commands are transferred back to the nodes for synchronized 
execution. 

In the next section, the role of runtime reconfiguration will be demonstrated. It follows from the 
inherent network topology properties assumed in distributed state estimation that reconfiguration 
decisions are based on the information from local and neighboring nodes. As such, a distributed 
implementation of runtime reconfiguration is feasible, even on nodes of moderate computing 
capabilities. 


22.7 Case Study on a Diffusion Process 


The results of the presented self-organizing sensor network for state estimation are demonstrated 
and evaluated in a spatiotemporal 2D diffusion process. The goal of the sensor network is to follow 
the contaminant's distribution profile in time (i.e., the concentration distribution in space and 
time of a particular chemical compound) in the presence of wind. To that extent, let us consider 
an area of 1200 x 1200 meters containing a contaminant source. As time passes, the contaminant 
spreads across the area due to diffusion and wind. To simulate the spread, let us divide the area 
into a grid with a grid-size of 100 meters. The center of each grid-box is defined as a grid-point. 
Then, the spread of the contaminant is represented by the concentration level p e Ry at the 
g-th grid-point q € [1, 144]. This concentration level o depends on the corresponding levels at 
neighboring grid-points, which are denoted as q, for north, q, for south, ge for east and gw for west. 
See Figure 22.10 for a graphical representation of these grid-points relative to the g-th grid-point. 
Further, the continuous-time process model of 0, for some a, 4p, 45 de, dy € R, yields 
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Figure 22.10 The monitored area is divided into a grid. Each grid-point q has four neighbors qn, 
qs, Je, and qw, that is, one to the north, south, east, and west of grid-point q, respectively. The 
chemical matter produced by the source spreads through the area due to diffusion and wind. 
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PO = ap + anp™ + ap + aep? + ap +u?, Yq e Zi144]. 


The variable u e R} in (22.7) parameterizes the production of chemical matter by a source at 
grid-point q and follows 419 =75, u? =75, 430 =100, uD = 100, and uÍ? = 175 for all 
time + € Ry, while 47 =0 for all other q € Z11144]- The remaining parameters are chosen to 


i i irecti i = 2 A e Z 
establish a northern the wind direction, that is, a= 300» 4n = gp 4 = gop Ze = gy and 
= 2 
A 800° . . . . . 
A sensor network is deployed in the area to reconstruct the concentration levels at each grid-point 


based on the local measurements taken by each node. 


a 


Communication The network consists of 18 sensor nodes that are randomly distributed across 
the area, see also Figure 22.10. It is assumed that the sensor nodes communicate only with 
their direct neighbors, that is nodes with a 1-hop distance, and that their position is available. 

Process Neither the wind direction nor values of the contaminant source are available to the 
nodes. Therefore, the process model that is used by the local estimation algorithms of the 
different nodes is a simplified diffusion process in continuous-time, that is, 


pa 2 ap al x,” + a p9) + xep? + Aup” +40, 

with «= =e Xn = at a= ae o= seo and dy = sa The unknown source 
and model uncertainties are represented by process noise w? e R, for all q € Zi144]- A 
suitable characterization of this noise, that is, to cover unknown source values (7 between 
—150 and 150, is given by the continuous-time PDE pw? (t)) = G(w(t), 0, 2-10). 
Further, the state is defined as the collection of all concentration levels, that is, x := 
(Pp) pa... pass)? The model parameters Ar, and Qz, of the discrete-time process 
model in (22.1) are characterized with the initial sampling time of 1; = 10 s, for all nodes 
i e N. To determine the other process model parameters, that is, C; and V;, it is assumed 
that each sensor node i measures the concentration level at its corresponding grid-point, that 
is, yik;] = 0? [k;] + v; for some q € Zi1 144] and p(vilk;]) = G(v;[k;], 0, 0.5), for all 
i € Z1118]- The real concentration levels at the three time instants + = 140, + = 240, and 
t = 340 are illustrated in Figure 22.11. 


t=140s ae Sg, t=240 s 


Figure 22.11 The simulated concentration levels at the different grid-points for two instances 
of the time t € R4. 
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The objective of the sensor network is to determine the contaminant distribution by estimating 
the state x in multiple nodes of the network. This is carried out in two types of sensor networks: 
a hierarchical network and a fully distributed one. In each configuration, unforseen events occur 
indicating node breakdown and batteries depleting below critical energy level. The nodes must 
adapt their local state estimating functionalities to recover from lost neighbors and/or to reduce 
their energy consumption so that batteries do not get depleted. Let us start this analysis with the 
hierarchical network. 


22.7.1 Hierarchical Sensor Network 


In a hierarchical sensor network, nodes are given specific tasks prior to its deployment. Basically, the 
network consists of multiple subnetworks, as it is illustrated in Figure 22.12a. In each subnetwork, 
nodes exchange their local measurements with the center node of that particular subnetwork 
(denoted with dashed lines). The center node computes a local estimate based on these received 
measurements via fkr, after which this estimate is shared with the center nodes of other subnetworks 
(denoted with the solid lines). The received estimates are then fused with the local estimate according 
to the merging function fmz and the fusion method ellipsoidal intersection of (22.17). 

Two events will occur in this network, followed by the corresponding action as it is implemented 
in the reconfiguration process of each node. The reconfiguration is local: Operational events are 
monitored locally and the reconfiguration actions influence only the node that issued the request 
for action. A rule-based representation formalism is used to define the “knowledge base” of the 
reconfiguration functionality. As such, the REASONER component of the middleware implements 
a forward chaining rule interpreter, that is, if event then action [61-63]. For clarity of the illustrative 
example, we do not attempt a rigorously formal description of the knowledge base but only the 
“style” of the rule-based representation is shown. 


m Art = 150s, nodes 1, 3, and 8 will cross their critical energy level; 
If the critical energy level is crossed, then lower the node’s local sampling time from 10 s to 20 s. 
m Art = 250s, node 5 will break down. To detect whether a state estimating node, breaks 
down, nodes within each subnetwork exchange acknowledgments or heartbeat messages are 
used to indicate normal operational mode. 
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Figure 22.12 Network topology in a hierarchical network. (a) Initial topology and (b) topology 
after 250s. 
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If the acknowledgment of the state-estimating node is not received, then check the energy levels of 
all other nodes in the corresponding subnetwork. The node with the largest energy level takes over 
the responsibility for estimating the state, according to an algorithm that is similar to the node 
that broke down. Also, reestablish the connection with the other subnetworks. 


As an example, the rule set in the following text shows the handling of the #2 event 


Rule_2a: 
IF NEIGHBOR(?x) & TIMEDOUT(?x) € ?x.function = centerfun 
THEN set (go_for_newcenter, TRUE) 


Rule_2b: 

IF go_for_newcenter € NEIGHBOR(?x) € 

ITIMEDOUT (?x) € max(?x.power) = self.power 

THEN exec (assign, centerfun), exec (broadcast, centerfun_msg) 


Figure 22.12a depicts the network topology prior to the event that node 5 breaks down, while 
Figure 22.12b illustrates this topology after the event (assuming that the battery of node 2 has 
the highest energy level). This figure indicates that node 2 has become responsible for estimating 
the state and thereby, replaces node 5 that broke down at ¢ = 250 s. Further, Figure 22.13 
depicts a particular estimation error, for which the estimation error of single node i is defined as 
A; := (x — fy) (x — x;). More specifically, the figure presents the difference in the estimation 
error of a network not effected by operational event with the estimation error in a network that és 
effected by the previously presented operational event. The reason that the figure depicts the results 
of node 7 is because this node is affected by both events. 

Before Figure 22.13 is analyzed, let us denote the hierarchical network in the presence of the 
aforementioned operational events as the reconf-case and the hierarchical network in the absence 
of operational event as the ideal-case. Then the figure indicates that the results of the reconf-case 
and the ideal-case are equivalent until the two operational events occur, which is expected as 
both network cases are similar until 150 s. After that time, the estimation error of node 7 in the 
reconf-case increases with respect to the ideal-case. This is due to the fact that nodes 1, 3, and 
8 double their local sampling times from ¢ = 150 on and thus, node 7 will receive twice as less 


ideal reconf 
A; > A; 


: „i Time (s) | 
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Figure 22.13 The difference in the estimation error of node 7 for a network that is not effected 
by operational events (Aideal) with the estimation error in a network that is effected by the 
previously presented operational event (arom), 
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measurement information from nodes 1 and 3. This leads to an increase in estimation error of 
node 7 compared to the ideal-case. Further, note that this error decreases when local measurement 
information from nodes 1 and 3 is received, that is, at the time instants 170, 190, 210,...,370, 
390. At these instances node 7 receives two more local measurements, that is, y, and y3, which 
is not the case at the other sampling instants as nodes 1 and 3 doubled their local sampling time. 
After the second operational event, that is, node 5 breaks down at + = 250, the difference in the 
estimation error of node 7 for the reconf-case with respect to the ideal-case decreases (on average). 
This behavior can be explained from the fact that node 2 has become a direct neighbor of node 
7, while this node 2 was indirect neighbor via node 5 prior to t = 250. Since node 2 is closer to 
the contaminant source, node 7 obtains an improved estimation result when node 2 is its direct 
neighbor rather than an indirect one. 


22.7.2 Distributed Sensor Network 


The distributed sensor network reflects an ad hoc networked system. This means that the nodes 
establish a mesh-network-topology, as it is depicted in Figure 22.14a. Since there is no hierarchy 
in this network, each node estimates the local state by performing the distributed KF of (22.14): 
the local measurement is processed by fig to compute a local estimate of the state, which are then 
shared with neighboring nodes as input to the merging function Q employing the state fusion 
method ellipsoidal intersection of (22.17). 

Two events will occur in this network, followed by the corresponding action as it is implemented 
in the management layer of each node. 


m At: = 150s, nodes 1, 3, and 8 will cross their critical energy level. 
If the critical energy level is crossed, then lower the node’s local sampling time from 10 s to 20 s. 
m At ¢ = 250s, nodes Sand 11 will break down. Nodes detect that another node has broken 
down, since no new local estimates are received from that node. 
If a node breaks down and the network has lost its connectivity, then establish a network 
connection with other nodes until this connectivity is reestablished. In case this means to increase 
the communication range to larger distances, decrease the sampling time accordingly. 


Figure 22.14a depicts the network topology prior to the event that nodes 5 and 11 break down, 
while Figure 22.14b illustrates the topology and after the event. This figure indicates that the 
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Figure 22.14 Network topology in a distributed network. (a) Initial topology and (b) topology 
after 250 s. 
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Figure 22.15 The difference in the estimation error of node 7 for a network that is not effected 
by operational events (Aideal with the estimation error in a network that is effected by the 
previously presented operational event (Aeon), 


sensor network establishes connectivity, also after the event of a node breaking down. However, 
the nodes 6 and 15 will have to exchange data with nodes that are far away. Therefore, these node 
will lower their local sampling time to 20 s. Further, Figure 22.15 depicts the same estimation 
error as Figure 22.13, only then for a distributed network. This means that the figure presents the 
difference in the estimation error of a network not effected by operational event with the estimation 
error in a network that is effected by the previously presented operational event. The reason that 
the figure depicts the results of node 7 is because this node is affected by both events. 

Before Figure 22.15 is analyzed, let us denote the distributed network in the presence of the 
aforementioned operational events as the reconf-case and the distributed network in the absence of 
operational event as the ideal-case. Then, the figure indicates a similar behavior compared to the 
hierarchical network that was previously discussed; for example, the results of the reconf-case and 
the ideal-case are equivalent until the first operational event occurs, after which the error of the 
reconf-case increases with respect to the ideal-case. Also, the estimation results of node 7 have an 
“up-down” type of behavior, which is due to the action undertaken by nodes 1 and 3 to double 
their local sampling times. As such, node 7 receives an updated estimate from nodes 1 and 3 after 
every other of its local sampling instants. The difference between the estimation error of node 7 
in the reconf-case increases even further with respect to the ideal-case after the second operational 
event, i.e., nodes 5 and 11 break down at ż = 250. 

Both the illustrative case studies of a hierarchical and a distributed sensor network indicate 
that the state is estimated by multiple nodes in the network, even in the presence of unforeseen 
operational events. As such, adopting a self-organizing method in large-scale and ad hoc sensor 
networks improves the robustness of state estimation within the network. 


22.8 Conclusions 


Ad hoc sensor networks typically consist of a large number of vulnerable components connected 
via unreliable communication links and are sometimes deployed in harsh environment. Therefore, 
dependability of networked system is a challenging problem. This chapter presented an efficient and 
cost-effective answer to this challenge by employing runtime reconfiguration techniques additional 
to a particular signal processing method (Kalman filtering). More precisely, a distributed Kalman 
filtering strategy was presented in a self-organizing sensor networks. This means that each node 
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computes a local estimate of the global state based on its local measurement and on the data 
exchanged by neighboring nodes. The self-organizing property was implemented via a runtime 
reconfiguration process, so to have a sensor network that is robust to external and internal system 
changes, for example, nodes that are removed or added to an existing network during operation. 

Firstly, a brief overview of existing solutions for distributed Kalman filtering was presented. 
The corresponding algorithms were described with equivalent input and output variables. As a 
result, nodes could choose which of the algorithms is currently best suitable for estimating the 
state vector, while taking into account the available communication and computational resources. 
This further enabled nodes to select what information is to be shared with other nodes, that is, 
local measurements or local estimates, and how the received information is merged with the local 
estimate. Secondly, the system architecture was addressed, such that challenging design issues could 
be separated from the actual implementation of a (self-organizing) distributed KF. To that extent, 
an overview of typical reconfiguration approaches was given with an emphasize on the interactions 
between the signal processing and hardware/communication aspects of system design. After that, 
the self-organizing property of the proposed distributed KF was assessed in a diffusion process for 
two types of sensor networks, that is, a hierarchical network and a fully distributed one. In both 
cases, the network was able to cope with unforeseen events and situations. Or differently, employing 
runtime reconfiguration in the nodes of the sensor network implements a kind of self-awareness 
with the ability to create corrective actions and thus assuring that data processing functionalities 
are never used beyond their scope of validity. 
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23.1 Introduction 


Conventional wireless sensor networks (WSNs) are generally made up of a set of autonomous 
multifunctional sensor nodes distributed over a specific environment. These sensor nodes are used 
to collect environment data and transfer these data to the user through the network that can 
include Internet segments. Besides collecting data, a node may also need to perform computations 
on the measured data. In general, deployment of conventional sensor networks for environmental 
monitoring is mainly limited due to the active life span of the onboard non-rechargeable power 
source. As the sensors are battery powered, it becomes difficult to periodically monitor the manual 
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replacement of the batteries. There have been a lot of research efforts in the direction of prolonging 
the limited lifetime of WSNs through efficient circuit, architecture, and communication techniques 
[1,2]. In summary, the use of a WSN system is strictly limited by the battery life of the sensor nodes 
[36]. There is a need for a new sensor network paradigm that is not based on an enhanced lifetime 
of a conventional WSN but is developed for a network that is free of any battery constraints. 

A wireless passive sensor network (WPSN) is a non-disposable and a cost-efficient system that 
operates based on the incoming received power [3-7]. This system is considered to be an efficient 
and a novel solution for energy problems in WSN [3]. The concept to remotely feed a sensor node 
on the power from an external radio frequency (RF) source has led to the emergence of the WPSNs. 
This concept was first introduced to power a passive RF identification (RFID) tag. It is well known 
that passive RFID design blocks form the basis for passive sensor node (PSN) architectures [8]. 
PSN operating frequencies fall under the same industrial, scientific and medical (ISM) frequency 
bands as most RFID applications. The latest trend in environmental monitoring application is to 
have sensor nodes operating at power levels low enough to enable the use of energy harvesting 
techniques [9,10]. This facilitates the deployed system, in theory, for continuous sensing for a 
considerable extended period of time reducing recurring costs. 

Building blocks of typical wireless PSN architecture consist of a sensing unit, communication 
unit, a processing unit, and a power source as shown in Figure 23.1 [4,5]. The sensing unit 
in most cases consists of a sensor (s) and an analog-to-digital converter (ADC) as components. 
A sensor is a device generally used to measure some physical quantity such as temperature, light, 
etc. The ADC is used to convert the received analog data signal into a digital signal so as to be 
processed by the microcontroller. The processing unit consists of a low-power microcontroller and 
a storage block. The microcontroller processes data and controls and coordinates other component 
functionalities. The communication unit consists of an RF transceiver module that transmits and 
receives data to/from other devices connected to the wireless network. The power unit mainly 
delivers the RE-DC converted power to the rest of node units and also stores additional power 
based on availability. 

The major differences in the architectures of a conventional WSN node and a WPSN node are 
the hardware of the power unit and the transceiver [4]. The power unit of the conventional WSN 
generally consists of a battery along with a support block called the power generator. The power 
unit for a WPSN node is basically an RF-to-DC converter—capacitor network. The converted DC 
power is used to wake up and operate the node or is kept in a charge capacitor for future usage. 


<n Microcontroller 
t transreceiver 
I | Memory | Communication 
unit 


Sensing unit Processing unit 


Capacitor 


- Power unit 


Figure 23.1 General WPSN node architecture. 
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A short-range RF transceiver, typically a major power-consuming unit on the node, is used in a 
conventional WSN as compared to a much simpler transceiver for modulated backscattering in the 
WPSN node [4,11]. 

To minimize the power consumed at the sensor node, the simplest of all solutions is to eliminate 
any or at least most signal processing at the node by transferring sampled data from all nodes to a 
central server. But there exist several applications in real-time control and analysis of continuous 
data sampling processes, where real-time signal processing and conditioning are implemented on 
discrete time sampled data. In many cases, the sensor data at each node must be preprocessed 
or conditioned before it can be handled by the processor. In such a scenario, where each sensor 
node must include a processor, there is need for application based dedicated processor hardware 
implementations that improve power efficiency and allow fine-grained design optimization. This 
chapter introduces a low-power conceptual design for a distributed architecture of a single passive 
sensor processor. 

Typically, the smaller the area of the processor used in WPSN node architecture, the lower 
is the price. Using small-area processors, which require minimum power to operate, to provide 
greater read distances is significant in this scenario. WPSN being an emerging research area, there 
is little documentation on all the power-efficient scenarios applicable to passive sensor devices. In 
Refs. [3,4,6,7] efficient antenna designs, low-power transceivers were introduced for WPSNs. Not 
only is it important to have energy-efficient front-end and power unit designs, but there is also a 
need to have low-power novel processor designs that allow greater ranges for WPSN nodes. This 
chapter forms the basis for the low-power passive distributed sensor node architecture providing 
an increased operating range of the passive device. 

Consider an intelligent sensor network topology with a sink node in the center communicating 
with several nodes around it. The server and a single sensor combination can be viewed as a single 
instruction single data (SISD) processor or the intelligent sensor network as a whole can be viewed 
as a single instruction multiple data (SIMD) processor. The power consumption of such an SIMD 
system depends on the hardware complexity of the passive node processor, which in turn depends 
on the instruction set (IS) supported by the architecture. 

An intelligent combination of circuit techniques, applications, and architecture support is 
required to build a low-power sensor node system. This chapter introduces and elaborates the 
key concepts of the low-power design of the WPSN node processor. Using the 8051-ISA as an 
example for the distributed design concept, application-based customization of the sensor processor 
architecture is also elucidated in the later sections. 


23.2 Low-Power Circuit Techniques in WSNs 


Energy has become a critical aspect in the design of modern wireless devices and especially in 
WSNs. There is a need for a new architecture that takes into account such factors especially for 
passive or battery-operated device applications. Energy is defined in general as the sum of switching 
energy plus the leakage current energy. 

The energy consumption equation is given as follows [1,12]: 


Etoral = Vdd(OCsw Vdd + leakage Atop). (23.1) 


Switching activity for 1 s is represented by a and the amount of time required to complete an 
operation is denoted by Atop as in the Equation 23.1. Vad, leakage» and Csw shown in Equation 23.1 
represent the supply voltage, leakage current, and switching capacitance, respectively. 
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Conventional WSNs employ a variety of low-power design techniques, and a short overview 
of these approaches is presented in the following paragraphs. The following circuit techniques are 
most commonly classified under the following categories, which are used to minimize the power 
consumption in sensor networks [1]. 

Asynchronous designs are increasingly becoming an integral part of numerous WSNs [13-16] 
due to their low-power advantages. These designs are characterized by the absence of any globally 
periodic signals that act as a clock. In other words, these designs do not use any explicit clock 
circuit and hence wait for specific signals that indicate completion of an operation before they go 
on to execute the next operation. Low-power consumption, no clock distribution, fewer global 
timing issues, no clock skew problems, higher operating speed, etc., are advantages of asynchronous 
designs over synchronous designs. 

Power supply gating is also a low-power circuit technique widely used to reduce the subthreshold 
leakage current of the system [17]. This process allows unused blocks in the system to be powered 
down in order to reduce the leakage current. This technique was used in the Harvard sensor 
network system [18]. 

A subthreshold operation technique allows supply voltages (Vaq) lower than threshold voltages 
(Vin) to be used for lowering the active power consumption. This technique was first used in the 
complete processor design for WSNs from University of Michigan [19-21]. 

The aforementioned techniques can also be extended to the WPSN based on the requirements 
of the application and the power available to a sensor node. The focus of this chapter will be on 
low-power solutions to wireless passive distributed sensor node architectures. The novel low-power 
techniques described in the following sections are applicable not only to WPSN but also to RFID 
systems and RFID sensor networks (RSNs). 


23.3 Novel Low-Power Data-Driven Coding Paradigm 


Wireless digital transmission systems are known to use different data encoding techniques especially 
the variable pulse width encoding and Manchester encoding techniques. Many RF applications, 
such as RFID passive tags, RSNs, sensors, serial receivers, etc., use this type of encoding. Most 
well-known receiver decoder designs use an explicit clock to decode (Manchester or Pulse Interval) 
encoded data. On receiving encoded data from the transmitter, the clock is extracted from it. 
A classical decoding process for pulse width modulated signals is by oversampling with a clock 
[22,23]. The received signal is sampled at a much higher bit rate clock than the received signal 
in order to decode it as shown in Figure 23.2. Symbol-1 or symbol-0 of the received encoded data 
stream can be identified by counting the number of clock pulses within each symbol as shown 
in Figure 23.2 (2 for symbol-0 and 4 for symbol-1). The well-known architecture of this classical 
decoding scheme is shown in Figure 23.3. This architecture basically consists of high-frequency 
oscillator, fast-clocked counter, and a comparator. The major disadvantage of using high clock rate 
driven decoders is the significant increase in the power consumption at the receiver side [23-26]. 


23.3.1 Pulse Width Coding Scheme 


A novel explicit clock-less coding scheme for communication receivers is shown in Figure 23.4 
for reducing the power consumption on the receiver side [24,27,37]. Pulse width coding (PWC) 
data shown in Figure 23.4 represent the encoded input demodulated serial data as “01100110.” 
In Figure 23.4, the PWC data signal is sampled at every rising edge of the delayed version of 
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Figure 23.4 PWC scheme. 


the PWC data signal in order to differentiate “1” and “0” for completing the decoding process. 
We can clearly see from Figure 23.2 that the decoded output bit is “1” whenever both the signals 
are high; otherwise it is a “0.” The most important power parameter in this decoding scheme 
is the delay (A). The minimum possible delay required is about PWo and the maximum delay 
required is less than PW}. In other words, for a successful PWC decoding, we need delay (A) to 
satisfy the condition: PWy < A < PW. The decoding mechanism proposed in Ref. [24] for this 
scheme is an extremely simple, low-power, and clock-less circuit realized using Complementary 
metal-oxide-semiconductor (CMOS) digital chip design techniques. 

The PWC scheme can be implemented and easily integrated to other well-known synchronous 
and asynchronous design variants that have high-power-consuming decoder modules in serial data 
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communication receivers. A well-known direct application is in the symbol decoding process of the 
passive REID tag and RSN node systems while the encoding remains unchanged at the transmitter. 

The delay generally in hardware translates to a buffer element. A buffer element is generally 
built using even number of inverters. Using an optimized library, there is a possibility to further 
optimize the design schematic generated and lower the power values for inverters that are used to 
interpret large delay values. Another alternative to designing low-power inverters is to individually 
model them based on the choice of parameters such as width, length, and target technology of 
the metal-oxide-semiconductor (MOS) layout designs [29]. The inverter can be designed from 
the transistor level using the CAD (computer-aided design) layout tools. This would also give the 
designer the flexibility to alter the width of individual P-type metal-oxide-semiconductor (PMOS) 
and N-type metal-oxide-semiconductor (NMOS) transistors to generate the necessary delays within 
the circuit [30,31] conforming to the low-power requirements. 


23.3.2 Data-Driven Decoder Architecture 


The PWC scheme introduced in the previous Section can be realized using the explicit clock-less 
architecture shown in Figure 23.5. This data-driven architecture does not use any explicit clock 
to drive its components such as the shift register and the comparator. The delayed encoded input 
acts as a clock to trigger the components of the decoder thus eliminating the need for any explicit 
clock. This architecture was successfully simulated using the standard CAD tools as a low-power 
and low-area data-driven decoder [24]. In Ref. [24], it has been reported that the post-layout 
power consumption of the data-driven chip was about one-fourth the power consumed by the 
conventional decoder design for the same data rate of 40 kHz. The cell area of the data-driven 
decoder is 69% smaller than that of the conventional design. Elimination of the high-frequency 
oscillator, fast-clocked counter, and an explicit clock has contributed to this significant reduction 
in both power consumption and the area occupied by the data-driven chip. 

An example passive RFID post-layout CMOS design was successfully simulated to operate at 
very low power using a custom low-power asynchronous computer [28]. This design uses the 
data-driven decoder with an integrated counter to it as the major low-power component of the 
entire architecture. The power consumption of the post-layout simulation results of both the syn- 
chronous/asynchronous RFID designs is illustrated in Figure 23.6a for different data transmission 
rates. The switching power consumption comparisons for different data rates are also illustrated in 
Figure 23.6b. There is a consistent linear increase in switching power for the synchronous design 
when compared to the asynchronous design as the data rate increases. In other words, the switching 
power of the asynchronous design is lower when compared to the synchronous design at each 
data rate. These results are very encouraging especially at the typical data rate corroborating the 
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Figure 23.5 Novel data-driven decoder architecture. 
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Figure 23.6 (a) Asynchronous versus synchronous average power comparisons. (b) Asyn- 
chronous versus synchronous switching power comparisons. 


concept that a passive RFID tag can be designed using an asynchronous design to significantly 
reduce power requirements and thereby increasing its read range. The same concept can also be 
applied to sensor node architecture especially for the case where the data-driven decoder can be 
integrated with a low-power processing unit. The low-power processing unit design concept will 
be discussed in the next section. 


23.4 Distributed Architecture Design for a WPSN Node 
Processor 


Microcontroller design choice for a sensor node leads to a trade-off between speed and energy 
efficiency. In most cases, power constraints dominate, which in turn leads to significant computa- 
tional constraints. In general, a microcontroller consists of a controller, volatile memory for data 
storage, ROM/EPROM/EEPROM, parallel I/O interfaces, clock generator, serial communication 
interfaces, etc. In Ref. [6], a general-purpose low-power 16 bit programmable microcontroller 
(MSP430F2132) is used for managing the entire node. The microcontroller design can be 
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tailor-made for applications to further reduce the power requirements at the node. A signifi- 
cant contribution toward achieving a low-power sensor node processor is introduced in this chapter 
that highlights customizing a processor based on its subset instruction set architecture (ISA) for the 
specific target application [27]. 

SIMD is a well-known class of parallel computers in Flynn’s taxonomy. SIMDs have the 
ability to perform the same operation on multiple data simultaneously for processors with multiple 
processing units. The need for synchronization between processors is not required. The proposed 
architecture presents a PSN(s) that can be replicated to produce an SIMD architecture. The passive 
units are powered and controlled by RF energy that enables convenient reconfiguration due to the 
ability to address nodes individually or in groups that can be simply and conveniently changed 
using RF communications. Thus, bits within the passive node processors can be set to perform or 
ignore commands thus allowing dynamic reconfigurability of the units composing the SIMD. 

Consider an intelligent sensor network topology with a base station (sink node) in the center 
communicating with several wireless PSNs around it as shown in Figure 23.7 [27]. Ina WPSN, a 
PSN is passive and is powered by the impinging RF wave, which is also used for communication, 
froma sink node. The major change with respect to architecture ofa PSN is only the processing unit 
as shown in Figure 23.1. The sink node acts as the RF source assumed to have unlimited power that 
feeds the PSN with RF power. The sink node transmits RF power to the randomly deployed PSN 
nodes for processing, sensing, and data collection activities. This sink node wirelessly transmits 
commands to the PSN that executes these commands and responds back to the sink node. A PSN 
can be implemented as a CMOS chip that provides logic to respond to commands from a sink 
node. Thus, the sink node and the PSN combination can be viewed as a complete processor or 
as multiple processing units [38]. This will form the basis of our distributed concept that will be 
introduced in the following paragraphs. 

The sink node (Control and Memory [C&M]) is an RF equipped control and storage base 
station, and the PSN processor is an execution unit with minimal storage capacity (e.g., registers) 
as shown in Figure 23.8. The sink node is allowed the flexibility to be a classical von Neumann- or 
Harvard-type architecture that consists of an interrogation control unit along with a program and 
memory units. Commands will be stored on the powered sink node that transmits the commands 
wirelessly to the PSN. The intent is to keep the PSN processor as simple as possible so as to maintain 
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Figure 23.7 WPSN topology with PSN fed by an RF source-sink node. 
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Figure 23.8 High-level distributed WPSN node processor architecture. 


low-power requirements and/or extend read range from the sink node. Any unnecessary complexity 
on the PSN processor will be moved onto the powered sink node. This would significantly reduce 
the hardware on the PSN side, thus reducing the overall power consumption of the node. 

Many other circuit design techniques can be applied to the sensor node architectures to reduce 
power consumption. One of the main power reduction techniques that can be employed in the 
proposed PSN processor design is to eliminate the use of an explicit clock overhead [28]. The 
entire circuitry of the PSN processor is to be asynchronous with the remote command execution 
controlled by the sink node making the system programmable and reconfigurable. The design 
uses a clock-less data-driven symbol decoder introduced earlier as a low-power component instead 
of the conventional input clocked data decoding process used at the sensor nodes [24]. Another 
technique commonly used to reduce power consumption is scaling down the supply voltage of the 
system or part of the system [1]. Any combination of these energy-efficient techniques can be used 
in addition to the distributed architecture concept based on application-specific requirements. 


23.4.1 Exploring the 8051 Microcontroller and Its ISA 
for WPSN Applications 


The choice of the Intel 8051 (i8051) is justified by the fact that it is still one of the most popular 
embedded processors. Furthermore, due to its small size and low cost, it has numerous applications 
where power efficiency is necessary. The most commonly used 8051 microcontroller in sensor 
nodes will be considered as an example for exploring its ISA and its application to the proposed 
conceptual distributed design. 

The 8051 is an 8 bit microcontroller that includes an IS of 255 operation codes. The 8051 
architecture consists of five major blocks, namely, control unit, ALU, decoder, ROM, and RAM. 
Based on the distributed design concept introduced as shown in Figure 23.8, the WPSN node 
processor consists of two major blocks with respect to 8051: 8051 compatible execution unit and 
the minimum number of temporary storage registers required. The execution unit is mainly an 8051 
ALU. The number of instruction supported by the execution unit depends on the target application. 

The sink node will transmit the program instructions to the WPSN node that executes these 
instructions and returns the results back to the sink node. The WPSN node, for example, will have 
the capabilities to perform functions like OR, XOR, AND, ADD, etc., that are compatible with 
8051 depending on the application. This sink node and the WPSN node together form a complete 
processor. As the program to be executed by the WPSN node is stored in the sink node, the need 
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Figure 23.9 Sequence diagram for an ADD operation. 
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for program memory at the passive node is eliminated. There still may be a need for local scratch 
pad memory at the WPSN node although the number of bytes is drastically reduced in order to 
satisfy the power requirements. The WSPN node executes the instructions wirelessly as issued by 
the sink node. 

Figure 23.9 represents a high-level sequence diagram for an ADD operation. Let us consider an 
ADD operation: ADD A, RI (A=A + R1), where R1 denotes one of the eight (RO—-R7) 8 bit 
8051 working registers for a selected register bank and A denotes the 8-bit accumulator register. 
The sink node sends out the R1 values to load and store it in the temporary storage on the WPSN 
node processor unit. On receiving the ADD instruction, the passive node processor's execution 
unit performs the addition operation on the already existing value in the accumulator and the new 
R1 value. The computed result on the accumulator register is sent back to the sink node. The sink 
node will contain main memory that acts as the major storage area for the majority of data items. 

Sensor applications require special-purpose hardware suitable to cater to a different set of 
requirements. Characteristics of the target applications and the utility of the sensors make it 
important to choose applicable hardware for sensor networks on a case-by-case basis. The power 
requirements of a WPSN limit the requirements needed for different applications used. Some of 
the well-known basic core algorithms form a class of simple applications such as the sum-array 
(sum of all values in a list), Top10 (finds top 10 values in a list), majority consensus (finds the 
majority values in a list), min-max finder (finds minimum and maximum values in a list), Binary 
search (typical search algorithm for a sorted list), Matrix Multiplication (matrix multiplication for 
small size matrices), etc. [32]. 

Generally, sensor networks employ only data filtering at the node so that every sensor sample 
need not be transmitted on the radio so as not to consume all the wireless bandwidth available to 
the network. By transmitting only necessary sensor data readings over the radio allows saving the 
available stored energy on the node. Let us consider a simple application scenario, for example, using 
sum-array application using 8051-ISA. The amount of temporary storage and the ALU capabilities 
of the WPSN node processor will be chosen to maintain low-power requirements. Assume that the 
only temporary memory space available for the execution unit is the register R7—RO of a selected 
single register bank of 8051. The major function is an ADD operation and hence the choice of the 
arithmetic instructions that would be part of the execution unit on the WSPN node are ADD A, 
Rn; ADDC A, Rn. The minimal data transfer instruction necessary would be the MOV A, Rn; 
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MOV Rn, A; and MOV Rn, #DATA (8 bit). The WSPN node processor will support only 
those features required to interface and communicate with the sink node. Therefore, the branch, 
comparison, load, and store instructions will be implemented on the sink node side rather than on 
the passive side. This ISA will be compatible with the i8051-ISA. Additional instructions can be 
added to enhance the capability of the execution unit and also depending on the application. 

To arrive at an energy-efficient computation solution, there is always a trade-off between the 
communications from and computation on a sensor node. Hence the choice of a design for a sensor 
node architecture depends not only on the low-power technique but also on the application space. 


23.5 Data-Driven Architecture Design Flow Methodology 


A data-driven architecture is a design paradigm that uses no explicit clock to drive its components 
[24]. Either data or local signals are used to drive components of the processor. This type of an 
asynchronous design uses no global periodic signals to synchronize its operations. Lack of strong 
support of commercial CAD tools is a major hurdle for synthesis of explicit clock-less designs. 
Asynchronous (very-high-speed integrated circuits) hardware description language (VHDL) designs 
generally are known to use non-synthesizable delay constructs such as wait, delay, etc., for their 
implementation in the absence of a global periodic signal for synchronization. Standard VHDL 
compilers are not known to synthesize VHDL code that implements an asynchronous design. Based 
on the conventional hardware descriptor languages, most asynchronous design methodologies 
[33-35] that have been proposed are not accessible to standard high-level design tools. The high- 
level data-driven design flow will be described in this section that requires minimum changes to a 
traditional synchronous flow [24,27,39]. 

Step 1: The data-driven design is first written in VHDL along with the necessary non- 
synthesizable delay constructs. This VHDL design is then simulated using Mentor Graphic’s 
ModelSim. A customized test bench is used to verify the correct functionality of the design in 
ModelSim. 

Step 2: A synthesized netlist is generated for the data-driven design using the Synopsys Design 
Compiler upon the successful verification of the VHDL design in step 1. 

The “dc_shell” command interface provides a script execution environment based on Tool 
Command Language (TCL). The basic directives of a TCL script include setup environment 
variables, constraints, basic compilation directives, etc. The major modification will be to eliminate 
any clock in the script. All the statements that involve the non-synthesizable VHDL delay constructs 
are identified and removed. Synthesizable delay commands need to be separately inserted into the 
TCL script during the synthesis process. The available delay commands are set_max_delay and 
set_min delay. These commands have several options, generally needs a -from set of start points, -to 
set of end points along with a fixed target delay value. The start and end points refer to specific 
cells and their corresponding input and output pins in the schematic of the design. Identification 
of the necessary start/end points in the design is the key for the accurate working of the design as 
insertion of these delay changes the timing graph of the design. 

The next step is to compile, simulate, and verify the generated design netlist Verilog file along 
with the target technology library using ModelSim. 

Step 3: After the successful post-synthesis verification, the design layout along with the post 
place-and-route netlist is generated using a VLSI Cadence layout tool known as Cadence Encounter. 
The post place-and-route design netlist is simulated and verified for the expected operation using 
ModelSim along with a delay file format. After the successful post-layout simulation of the netlist, 
the layout verification of the data-driven design is said to be complete. 
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The final step is to estimate the power consumed by the data-driven design using Cadence 
Encounter. A switching activity file is generated from the initial test bench using ModelSim. 
During the final stage of the place-and-route process, the activity file is used in the power rail 
analysis option available with Encounter to produce the power report. 

The advantage of using data-driven designs is not only for achieving low power but also 
for allowing flexibility to implement explicit clock-less designs using standard CAD tools. The 
data-driven symbol decoder introduced in Ref. [24] was implemented using this methodology. 
This methodology can also be applied to the distributed WPSN 8051-node architecture when 
implemented as a data-driven design. The design flow used for data-driven designs can also be 
extended to a variety of design paradigms such as synchronous, asynchronous, globally asynchronous 
locally synchronous, and globally synchronous locally asynchronous [24]. 


23.6 Conclusion 


Conventional WSNs are a disposable system as they are dependent on the limited lifetime of their 
batteries. WPSN system of passive nodes does not face this problem as they are remotely powered 
by an RE source, but do have limited ranges. This chapter discusses novel low-power solutions 
to increase the range of WPSN nodes. This chapter illustrates the importance of the data-driven 
architecture using a novel clock-less symbol decoder architecture and its low-power applications that 
include synchronous and asynchronous design variants that have high-power-consuming modules 
in communication receivers. A detailed power analysis of the post-layout simulation power results 
of the data-driven symbol decoder and the conventional clocked symbol decoder for passive RE 
receiver systems has been analyzed. This chapter introduces the elements and concepts of the design 
ofa WPSN node processor as a distributed architecture that operates remotely and wirelessly from 
the sink node. A high-level asynchronous design flow that can be used to implement the data- 
driven design using synchronous CAD tools is also discussed. This design flow will provide the 
reader with sufficient guidelines to design and implement application-specific data-driven processor 
architectures. 

This research has the potential to realize WPSN node applications for environmental, structural, 
and medical fields especially while providing the basis for a programmable, reconfigurable, and a 
low-power passive processing unit for distributed computing. 

Currently, our research work is focused on developing a low-power distributed 8051-SIMD 
architecture as a clock-less design based on the concepts described in this chapter. 
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24.1 Introduction 


Radio frequency identification (RFID) technology refers to the use of multiple tags being attached 
to various items that are scanned and recorded in a database. The process itself eliminates mundane 
tasks such as conducting manual inventory checking and counting or using a barcode scanner to 
count the items to be purchased at a supermarket checkout. Modern integrations of this technology 
include baggage tracking at airports, pet owner identification, and tagging objects in stores to 
enforce security by alerting management when an item has left the facility without the tag being 
deactivated. Despite the wide-scale adoption and advantages of RFID, several issues exist that 
introduce a level of unreliability resulting in the technology being used only in a fraction of its 
potential applications. 

Due to the REID technology relying on radio signals automatically collecting tag data, certain 
anomalies are also recorded lowering the quality of the captured observations. These anoma- 
lies are classified as data that are captured but are not meant to be, the false-positives, and the 
data that are not present in the data set when it is meant to be, false-negatives. False-positives 
usually arise when either an observation is captured twice as a duplicate reading when it is 
only meant to be recorded once, or in the case of data being captured by a reader when they 
are not within its normal range. False-negatives usually occur due to external forces such as 
interference from water or metal, or tags interfering and colliding with each other on the air 
interface. 

Previous methodologies have been proposed in past literature to correct the anomalies at the 
different stages of the RFID capturing process: the physical, middleware, and deferred stages. 
Common physical solutions involve altering the location or conditions at which the data are 
physically captured such as modifying the orientation of the objects or attaching additional tags to 
increase the likelihood of scanning the object. Middleware solutions involve the use of algorithms 
such as anti-collision techniques or filters to correct data as they are read into the reader. Finally, 
deferred solutions employ algorithms to process captured observations after they have been stored 
within the data warehouse. Each of these methodologies, however, either introduces an amount of 
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artificially created anomalies or lacks the intelligence to correctly decipher the correct approach to 
eliminate the anomalies. 

To reduce the amount of unintentional artificial anomalies and increase the intelligence of 
the correction approach, we have introduced various novel techniques both before and after the 
storage of captured data. In this chapter, we examine various techniques used in different ways. 
This includes both probabilistic and deterministic anti-collision algorithms, and various machine- 
learning algorithms designed specifically to identify and correct both false-positive and false-negative 
anomalies. With regards to the tree-based pre-processing approach, we reduce the tag starvation 
problem resulting in missed readings having an increased chance of being recorded and compare it 
to already used state-of-the-art methodologies. Within the machine-learning deferred approaches, 
we examine the use of a Bayesian Network, Neural Network, and Non-Monotonic Reasoning 
classifiers in a novel application designed to both detect and correct an anomaly where possible, 
and discover the highest achieving techniques. 


24.2 Background 


REID technology refers to the use of multiple tags being attached to various items that are scanned 
and recorded in a database. Modern integrations of this technology include baggage tracking at 
airports, pet owner identification, and tagging objects in stores to enforce security by alerting 
management when an item has left the facility without the tag being deactivated. 

Due to the RFID technology relying on radio signals automatically collecting tag data, certain 
anomalies are also recorded lowering the quality of the captured observations. These anomalies are 
classified as data that are captured but are not meant to be, the false-positives, and the data that are 
not present in the data set when it is meant to be, false-negatives. After significantly reducing these 
issues, it will possible to integrate RFID technology into numerous commercial sectors resulting in 
an increase in both efficiency and effectiveness of business processes. 

To combat the highly ambiguous anomalies in RFID data sets, intelligent anti-collision proto- 
cols, such as deterministic and probabilistic anti-collision algorithms, along with classifiers, such 
as Bayesian networks, neural networks, and non-monotonic reasoning, may be employed. Deter- 
ministic method begin the identification process by issuing a prefix until it gets matching tags. 
It will then continue to ask for additional prefixes until all tags within the region are found. In 
contrast, probabilistic methods allow tags to respond at randomly generated times. If a collision 
occurs, colliding tags will have to identify themselves again after waiting for a random period of 
time. Bayesian networks operate by finding the highest probable conclusion when given a list of 
observations. Similarly, the neural network has a list of observations input into it as a feature set 
and the trained network will determine which conclusion is to be drawn. Finally, non-monotonic 
reasoning (NMR) operates by having a set of rules defined by the user including precedence of 
these rules and uses them to determine its conclusion. 


24.2.1 Radio Frequency Identification 


REID has had a long history commencing with its utilization during the Second World War to its 
modern usage [1]. The basic architecture of RFID itself consists of a tag, reader, and middleware 
to perform advanced analysis on the data that make it practical for use in many applications with 
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beneficial outcomes. There are several problems that arise when using the passive tags due to the 
nature of the system, in particular, the amount of unreliable readings in the raw data. 


24.2.1.1 System Architecture 


The system architecture of an REID system contains four important components [2]: an RFID 
tag, an RFID reader, the RFID middleware, and the database storage. For a diagram representing 
the flow of information in this system architecture, see Figure 24.1. 

The RFID tag is the simplest, lowest level component of the RFID system architecture. These 
tags come in three types—passive, semi-passive, and active. The tag itself is made up of three 
different parts: the Chip that holds the information the tag is to dispense, the antenna that is used 
to transmit the signal out, and the packaging that houses the chip and antenna and may be applied 
to the surface of other items [2]. Passive tags are the most error prone but, due to the lack of a 
battery, also the most cost effective and long lasting [3]. 

Electromagnetic pulses emitted from the readers allow the passive tag enough energy to transmit 
its identification back [4]. In comparison, the semi-passive tag has a battery source attached to it. 
However, the battery is only utilized to extend the readability range as it will use the reader’s pulse 
to transmit its information resulting in a shorter life span but increased observation integrity. The 
final tag is the active tag that utilizes a battery to not only extend its range but also to transmit 
its identification number. From its heavy reliance of the battery, this tag has the highest cost and 
shortest life span of all the tags currently available. The active tag also has advanced features not 
capable by the other types such as communicating with other tags within proximity [2]. Even 
today, there are novel and emerging technologies to reduce the production cost even further such as 
the chipless RFID system tags [5,6] and readers [7], which will be integrated into future applications 
[8,9] such as space exploration [10] and airport baggage tracking [11]. 

The RFID readers are the machines used to record the tag identifiers and attach a timestamp of 
the observation. They do this by emitting a wave of electromagnetic energy that then interrogates 
the tags until they have responded. These devices have a much greater purpose when needing to 
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Figure 24.1 Flow of information between the different components of the RFID system 
architecture. 
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interrogate passive and semi-passive tags as they also provide the power necessary to transmit the 
information back. Readers like the tags come in a variety of types such as the handheld reader and 
the mounted reader. The mobile handheld tags are used for mainly determining which objects are 
present within a group, for example, when needing to stocktake items within a supermarket. In 
comparison, the mounted readers are static in geographical locations and used primarily to track 
items moving through their zones such as mounted readers to observe all items on a conveyer belt. 

The middleware, also commonly known as the savant or edge systems, is the layer at which the 
raw REID readings are cleaned and filtered to make the data more application friendly. It receives 
information passed into it from the readers and then applies techniques such as anti-collision and 
smoothing algorithms to correct simple missing and duplicate anomalies [12,13]. The filtrated 
observational records including the tag and reader identifiers along with the timestamp the reading 
was taken are then passed onto the Database storage. 

At the end of the data capture cycle, observational records are passed into a common area where 
all readings are achieved. This component is known as the database storage and is used to hold 
all information that is streamed from the readers. In most cases, due to the massive amount of 
interrogation required to read every tag constantly, this can result in a massive flood of data. For 
example, it has been stated that Wal-Mart has generated 7TB of data daily from its integration of 
REID systems [14]. Having all information stored in a central database also allows for higher level 
processes such as data cleaning, data mining, and analytical evaluations. 


24.2.1.2 Format of Observations 


The format of the data recorded in the database after a tag has been read consists of three primary 
pieces of information: the electronic product code (cEPC), the reader identifier that made the 
observation, and the timestamp that contains the time the reading occurred. Table 24.1 contains 
information typically stored in the database storage. 

The EPC is a unique identification number introduced by the Auto-ID center and given to 
each RFID tag. It is made up of a 96 bit, 25 character-long code containing numbers and letters. 
The number itself is made up of a header for 8 bits, EPC manager for 28 bits, object class (COC), 
for 24 bits, and serial number (SN) for 36 bits [15]. Ward and Kranenburg state that a possible 
alternative to using the EPC is to employ IPv6, which is the advanced version of Internet addresses. 
These will take over the current system, which is IPv4 [15]. It is estimated that since IPv6, will 
have 430 quintillion Internet addresses as opposed to the current 4 billion address limit, there will 
be enough addresses for all items being tracked with REID. 


Table 24.1 Table Populated with Sample RFID Data 
Containing Information about EPC, Reader, and Timestamp 


EPC Reader Timestamp 


030000E500023C000431BA3 001 2008-07-29 14:05:08.002 


030000E500023C000431BA3 003 2008-07-29 14:32:12.042 


030000E500023C000431BA3 002 2008-07-29 14:45:54.028 


030000E500023C000431BA3 004 2008-07-29 15:02:06.029 


030000E500023C000431BA3 007 2008-07-29 15:18:49.016 
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The reader identifier attribute is the unique identifier of the reader so that the analyzer will 
be informed of which reader took the EPC reading. If the reader is static in its location as well, 
such a position of the reading may be derived from a simple query in the database later using this 
value. Knowledge of the geographical location of each unique reader identifier may also provide 
additional information needed for future business processes. The Timestamp contains a temporal 
reading used to identify the date and time that the tag passed within the vicinity of the reader. For 
example, 2008-07-29 14:05:08.002 would be stored as a timestamp. 


24.2.1.3 RFID Anomalies 


There are certain characteristics associated with the nature of RFID technology [16,17]. These 
challenges include low-level data, error-prone data, high data volumes, and its spatial and temporal 
aspects. With regards to the error-prone data, RFID observations suffer from three various anomalies 
that are recorded along with actual REID readings. The first is a wrong reading in which data are 
captured where they should not be. The second is duplicate readings in which a tag is observed 
twice rather than once. The third is the missed readings that occur when a tag is not read when 
and where the object it is attached to should have been physically within proximity. 


24.2.2 Collision Handling in RFID Data Streams 


REID collision handling is one of the most heavily researched topics because it is a very important 
step to determine a quality of captured data. The better quality of data at the earlier stage of 
data processing means less complex algorithms are needed for REID event process and database 
management. This section explains the type of each collision and surveys on existing deterministic 
and probabilistic anti-collision methods. 


24.2.2.1 RFID Collision Types 


Simultaneous transmissions in RFID systems lead to collisions as the readers and tags typically 
operate on the same channel. Three types of collisions are possible: reader-to-reader collision, 
reader-to-tag collision, and tag-to-tag collision [18]. 


m Reader-to-reader collisions: Interference occurs when one reader transmits a signal that 
interferes with the operation of another reader and prevents the second reader from commu- 
nicating with tags in its interrogation zone. Reader-to-reader collision can be easily avoided 
by determining the appropriate reader’s deployment that prevents direct signal interference 
between two or more readers. 

a Reader-to-tag collisions: Interference occurs when one tag is simultaneously located in the 
interrogation zone of two or more readers, where more than one reader attempts to 
communicate with that tag at the same time. 

a Tag-to-tag collisions: Tag collision in RFID systems happens when multiple tags are ener- 
gized by the RFID reader simultaneously and reflect their respective signals back to the 
reader at the same time. This problem is often seen whenever a large volume of tags 
must be read together in the same reader zone. The reader is unable to differentiate these 
signals. 
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24.2.2.2 Deterministic Anti-Collision Protocols 


Deterministic methods can be classified into a memory tree-based algorithm and a memoryless tree- 
based algorithm. In the memory algorithm, which can be grouped into “tree splitting,” “binary 
search,” and “bit arbitration,” the reader's inquiries and the responses of the tags are stored and 
managed in the tag memory. This results in an equipment cost increase especially for RFID tags. In 
contrast, in the memoryless algorithm, the responses of the tags are not determined by the reader's 
previous inquiries. The tags’ responses are determined only by the present reader’s inquiries so that 
the cost for the tags can be minimized. “Query Tree” (QT) is classified as memoryless algorithm. 

Depending on the number of tags that respond to the interrogator, there are three cycles of 
communication between tag and reader in deterministic approaches. 


m Collision cycle: Collision cycle occurs when the number of tags that respond to the reader is 
more than one. The reader cannot identify the ID of tags. 

m Idle cycle: Idle cycle occurs when there is no response from any tag to the reader. This type of 
cycle is unnecessary and should be minimized. 

m Successful cycle: Successful cycle happens when exactly one tag responds to the reader and the 
reader can identify the ID of that tag. 


For the tree-based anti-collision, we focus on QT-based protocols because it is the most acceptable 
and effective anti-collision technique for passive UHF tags [19]. The QT [20] is a data structure for 
representing prefixes that are sent by the RFID reader. The QT algorithm consists of loops, and in 
each loop, the reader issues a query with specific prefixes, and the matching tags respond with their 
information. If only one tag replies, the reader successfully recognizes the tag. If more than one tag 
tries to respond to reader’s query, tag collision occurs and the reader cannot get any information 
about the tags. The reader, however, can recognize the existence of tags to have ID that matches the 
query. To further identify collided tags, the QT algorithm tries to query with 1-bit longer prefixes 
in the next round of identification. By extending the prefixes, the reader can recognize all the tags. 

Figure 24.2 displays an example of a QT procedure. An identification process starts at Level 
one of tree, where QT uses tag IDs to split a tag set. It can be seen that Tag 1010 is successfully 
identified in the first round because from all three tags, only Tag 1010 has “1” for the first bit of 
string. In the second round of identification, idle cycle was created, as there was no tag starting with 
“00” for the first two bits. In the third round of identification, the other two tags, Tag 0100 and 
Tag 0111, are successfully identified. 


@ Collision cycle 


@ Idle cycle 
© Successful cycle 


Tag 1010 


da 
O O @ Skip- cycle 


Tag 0100 Tag0111 


Figure 24.2 Query tree memoryless-based anti-collision protocol. 
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24.2.2.3 Probabilistic Anti-Collision Protocols 


In a probabilistic approach, tags respond to readers at randomly generated times. If a collision 
occurs, colliding tags will have to identify themselves again after waiting a random period of time 
[19]. When we mentioned the probabilistic anti-collision approach in RFID, we usually refer to the 
ALOHA-based approach, which is the most widely used type of anti-collision. “Slotted ALOHA” 
[21], which initiates discrete time-slots for tags to be identified by reader at the specific time, was 
first employed as an anti-collision method in an early days of RFID technology. The principle 
of slotted ALOHA techniques is based on the “pure ALOHA” introduced in early 1970s [22], 
where each tag is identified randomly. To improve the performance and throughput rate, different 
anti-collision schemes were suggested in the past literature. “Framed-slotted ALOHA” technique 
is the most improved ALOHA-based technique currently applied in many applications. The most 
accepted framed-slotted ALOHA technique is the “dynamic framed-slotted ALOHA (DFSA).” 

In DFSA, each tag in an interrogation zone selects one of the given / slots to transmit its 
identifier; and all tags will be recognized after a few frames. Each frame is formed of specific 
number of slots that is used for communication between the readers and the tags. To determine the 
frame-size, it gathers and uses information such as number of successful slots, empty slots, and collision 
slots from the previous round to predict the appropriate frame-size for the next identification round 
[23-26]. DFSA can identify the tag efficiently because the reader adjusts the frame-size according to 
the estimated number of tags. However, the frame-size change alone cannot sufficiently reduce the 
tag collision when there are a number of tags because it cannot increase the frame-size indefinitely. 
DFSA has various versions depending on different tag estimation methods used. There have 
been several researches to improve the accuracy of frame-size by implementing frame-size estimation 
techniques [27-30]. According to the DFSA protocol, the reader picks a tag within an interrogation 
zone by the command “Select,” then issues “Query,” which contains a “Q” parameter to specify the 
frame-size (frame-size F = 22 — 1). Each selected tag will pick a random number between 0 and 
2 — 1 and places it into its slot counter. The tag, which picks zero as its slot number, will respond 
and backscatter its EPC to the reader. Then, the reader issues the “queryrep” or “queryadjust” 
command to initiate another slot [31,32]. 

Similar to the tree-based anti-collision, there are three kinds of slot in ALOHA-based anti- 
collision, as shown in Figure 24.3: (1) empty slot where there is no tag reply, (2) successful slot 
where there is only one tag reply, and (3) collision slot where there is more than one tag reply. The 
term “initial Q” refers to the first “Q” or frame-size, which applies to a specific identification cycle. 


Tagl je 
Slot no. = 
Tag2 m1 
Slot no. =1 


Tagl [EN 
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Tag 2 
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Tag3 {ej Tag 3 
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Figure 24.3 (a) Empty slot, (b) successful slot, and (c) collision slot in EPC class 1 generation 2 
protocol. 
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In Figure 24.3a, the reader first initiates a “query” and broadcasts the signal to nearby tags. Since 
there is no tag that picks zero as its slot counter, the slot is counted as an empty slot. Figure 24.3b 
shows that after the first “query” was sent, each tag deducted its slot counter by one. The reader 
then sends “QueryRep” to tags in close proximity; and any tag that has zero as its slot counter 
replies. When there is only one tag that responds, a successful slot occurs and the tag replies to the 
reader with its RN16. Figure 24.3c demonstrates that when two tags respond to the reader at the 
same time, a collision slot occurs and in this case, no information is transmitted. 


24.2.3 Classifiers 


While anti-collision techniques can be applied to filter the incoming data, it is not able to restore 
highly ambiguous missing reading and no wrong anomalies. Thus, a highly intelligent approach 
should be utilized to correct these anomalies after it has been stored inside the database. One such 
highly intelligent approach is the integration of classifiers to correctly determine if an anomaly is 
present and then the actions to correct the information to maintain a high level of integrity. 


24.2.3.1 Bayesian Networks 


Bayesian networks refer to a network designed to find the highest probable solution to any given 
problem. This is usually performed by determining the product of evidence found in a situation and 
comparing it with other possible causes until the greatest probable outcome is discovered. When 
expressing the mechanics of any Bayesian network, there are three common mediums: a joint 
distribution equation, an influence diagram, and a Bayesian network table. To demonstrate the 
idea of a Bayesian network, we have developed an example scenario in which a network is developed 
to determine the cause of a tree falling down (human or nature) when given such attributes as 
council markings and weather. The specific rules for this example are that there is a very high 
chance for the council to cut down the tree if this scenario is coupled with fine weather. However, 
if the weather is stormy, there is less chance of the tree being cut down by humans: 


POX; 0. Xp) = | | PRI, ....Xi-1)) (24.1) 


i=1 


The mathematical equation is a formula designed to express the process utilized in determining 
the percentage of likelihood of a cause being true. The information depicted in the equation 
will then be translated into a table. This table consists of the evidence vs. the causes in which a 
percentage is given to each case for the true and false outcomes of each scenario. From this table, 
all the percentages are multiplied together and a percentage score is given to each of the causes. 
A Bayesian network will then conclude that the most probable cause is the cause with the highest 
achieving percentage. 


24.2.3.2 Artificial Neural Network 


As seen in Figure 24.4, an artificial neural network (ANN) is a classifier designed to emulate the 
learning behavior of the brain. It does this by creating a fixed amount of neurons that are trained 
to deliver a certain output when fed various input. The entire process has actually been based on 
the biological neuron. Dendrites will receive information that is passed to the cell body whose 
objective is to pass the information into the axon when certain requirements are met and, thus, 
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Figure 24.4 A high-level interpretation of how a neural network is designed with its three main 
layers: the input, hidden, and output layers. 


to dendrites of other neurons via the synapse connection. The crucial difference between a digital 
neuron and its biological counterpart is that there is a computational limit to the amount of hidden 
units that may be present within a network. Unfortunately, technology has not advanced enough 
to effectively and efficiently emulate the amount of neurons the human brain possesses, which is 
estimated to be between 10 billion and 1 trillion [33]. 

The ANN consists of three main layers: the input layer, hidden layer(s), and the output layer[34]. 
The processes include receiving inputs that are modified at a central sum area. The neuron will 
then apply an activation function such as the hard limiter, in which it is either assigned +1 or 
—1 if the value is positive or negative, respectively, or sigmoidal functions, which is displayed 
in Equation 24.2, to derive a value for the output. With regard to training the network, there 
are several techniques available such as the back-propagation (BP) [35,36] and genetic algorithms 
[37,38], which are both considered to be leaders with regard to the configuration of ANN: 


f(x) = 1/0 + exp(—x)) (24.2) 


When attempting to configure the neural network weights, one method that exists is to utilize 
training algorithms. Two dominant training algorithms that have been proven to excel in network 
training are the back-propagation algorithm and the evolutionary neural network. BP relies on the 
concept of training the network by propagating error back through the network via modifying the 
weights after the output has been calculated [35]. The algorithm uses either a predetermined limited 
amount of iterations or the root-mean-square (RMS) error threshold of the calculated output as 
stopping criteria [36]. 

The evolutionary neural network training algorithm in contrast to BP utilizes the theory of 
genetic evolution to train the network weights. Similar to the genetic algorithm process of training 
a Bayesian network, all the weights are added into a chromosome as genes to be manipulated 
according to the fittest output obtained [37]. The weights are initialized as small random numbers 
which are checked for either obtaining a high enough score or if the amount of generation limit has 
been reached. In the case where neither of the stopping criteria has been met, the algorithm will 
examine each chromosome in relation to achieving the correct output. A certain amount of the unfit 
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chromosomes is then destroyed within the population and is replaced with child chromosomes of 
two of the fitter chromosomes. The child chromosomes can be created by varied means such as 
one-point, two-point, or uniform crossover [38]. Mutation will then be applied to a certain small 
percent of the population to ensure that the network avoids problems such as network paralysis or 
local minima [39]. 


24.2.3.3 Non-Monotonic Reasoning 


NMR refers to a deterministic logic used to decipher the solution when given a number of relevant 
pieces of evidence. NMR is set apart from classic monotonic reasoning in that, in contrast of 
arriving at one conclusion for any given problem, NMR will consider a number of outcomes and 
will eliminate or add them as extra information that is readily available. In particular, we have 
investigated the clausal defeasible logic (CDL) as the proof algorithm to arrive at a conclusion as it 
has been designed specifically to be implemented in a computer [40]. 

A language called “decisive programming language” (DPL), proposed by [41], has been 
employed to illustrate scenarios which use CDL. Within DPL, several symbols are used to represent 
different relationships of the entities preceding the relation, the antecedent, and the entities posi- 
tioned subsequently after the relation, the conclusion. The first symbol is the strict rule relation that 
is represented as “>.” It dictates that this rule is certain with no possible ambiguity involved. The 
second symbol is the defeasible rule symbol “=>” denoting a relationship in which it is defeasible 
to say the former entity will result in the latter entity. The third symbol is the warning rule symbol 
“~” that describes when the former entity cannot disprove the latter (usually the negative of the 
latter). Other symbols that are used include the priority relation “>,” which dictates that the former 
rule is greater than the latter rule, and the negative symbol “~,” which turns the following variable 
into its negative counterpart. Although it is true that one conclusion must be drawn for any given 
situation, CDL has several levels of confidence, represented in formulae that may be used to obtain 
a different correct answer. These different formulae include the following: 


m u: This formula uses only certain information to obtain its conclusion. 

a 7: This formula allows conclusions in which ambiguity is propagated. 

m f: This formula does not allow any ambiguity to be used in obtaining its conclusion. 

m o: A formula in which any conjunction of the 7 and B formulae are used to reach its 
conclusion. 

m ô: The disjunction of 7 and f are used to draw conclusions. 


As discussed by [42], other strengths that set CDL apart from other reasoning algorithms are 
its ability to uses team defeat, failure-by-looping, and discovering the loops in a given reasoning 
system within a set number of steps. This logical engine has already been tested and implemented 
in two different scenarios to allow a robot dog play soccer and inside a robot designed as a means 
of alarming individuals of an emergency in an elderly care situation [43]. 


24.3 Anti-Collision Techniques 


In this section, we introduce a deterministic joined Q-ary tree [44] with the intended goal to min- 
imize memory usage queried by the RFID reader. Most implementations of tree-based algorithms 
are deployed with older type of EPC class 1, which has limited memory and capability. We also 
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introduce the probabilistic cluster-based technique (PCT) [3] anti-collision method to improve the 
performance of tag recognition process in probabilistic anti-collision algorithm. The remaining 
of this section comprises the explanation of our proposed joined Q-ary tree and PCT and the 
comparative analysis of both techniques. 


24.3.1 Deterministic Anti-Collision Approaches 


This section comprises the explanation on EPC encoding schemes, the typical scenarios discussion, 
and the foundation of joined Q-ary tree. 


24.3.1.1 EPC Encoding Schemes Analysis 


The most common type of encoding is the general identifier 96 (GID-96) bits scheme, which is 
independent of any existing identity specification or convention and can be used in most events. 
The GID is defined for a 96-bit EPC and is independent of any existing identity specification 
or convention. In addition to the header that guarantees uniqueness of the encoding type, the 
GID is composed of three fields: the general manager number (GMN), OC, and SN, as shown in 
Table 24.2. 

In order to manage and monitor the traffic of RFID data effectively, the EPC pattern is usually 
used to keep the unique identifier on each of the items arranged within a specific range [45]. The 
EPC pattern does not represent a single tag encoding, but rather refers to a set of tag encodings. For 
instance, the GID-96 includes three fields in addition to the header with a total of 96-bits binary 
value. 25.1545.[3456-3478].[778—795] is a sample of the EPC pattern in decimal, which later will 
be encoded to binary and embedded onto tags. Thus, within this sample pattern, the header is fixed 
to 25 and the GMN is 1545, while the OC can be any number between 3456 and 3478, and the 
SN can be anything between 778 and 795. 


24.3.1.2 Warehouse Distribution Scenarios 


For deterministic anti-collision approaches, we examine specific scenarios based on the assumption 
that items tend to move and stay together through different locations especially in a large warehouse. 
We focus on crystal warehouse scenario using GID-96 bits encoding scheme, which can be classified 
into four different scenarios: (1) unique item level, (2) unique container level, (3) Unique company 
level, and (4) unique warehouse level. 


Table 24.2 GID-96 Includes Three Fields in addition to the 
Header, with a Total of 96-Bits Binary Value 


GID-96 Bit | Maximum Decimal/Binary 
Header (H) 8 0011 0101 

General manager number (GMN) | 28 268,435,455 

Object class (OC) 24 16,777,215 

Serial number (SN) 36 68,719,476,735 


Note: Only “H” is shown in binary, while the rest are shown in decimal. 
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Red-wine case Red-wine case Crystal plate case Crystal plate case 
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Red-wine case White-wine case White-wine case Plastic plate case 


H = Header OC= Object class GMN= General manager number SN = Serial number 
(a) (b) (c) (d) 


Figure 24.5 Crystal warehouse scenario: (a) unique item level, (b) unique container level, (c) 
unique company-level, and (d) unique warehouse level. 


Unique item-level scenario: This scenario occurs when two collided tags (GID-96 encoding) are 
captured and they have the same encoding scheme (header), same GMN, same OC, but different 
SN. By using the crystal warehouse scenario example from Figure 24.5a, it can be seen that two 
collided tags are captured with the same encoding scheme, GMN, and OC. We believe that both 
tags are each attached to two different cases of red wine. 

Unique container-level scenario: This scenario takes place when two collided tags are captured 
and they have the same header, same GMN, different OC, and different SN. Figure 24.5b shows 
that crystal red-wine glasses and crystal white-wine glasses are packed in different case and pallet 
because they are different type of wine glasses. Within this scenario, each case of wine glasses will 
have a unique SN attached to it, with different OC for each pallet of white wine or red wine. 

Unique company-level scenario: This scenario is illustrated in Figure 24.5c. Two collided tags are 
captured and they have the same header, and unique GMN, OC, and SN. We believe that one tag 
is attached to the crystal plate case, while the other tag is attached to the white-wine case. We can 
assume that there are two different companies producing separate crystal ware, and that the wine 
glasses and plates are from different companies but share the same warehouse because they are both 
crystal. 

Unique warehouse-level scenario: This scenario occurs when two collided tags are captured and 
they have different header, GMN, OC, and SN. We can assume that all items are from different 
companies that use different encoding schemes. For example, Figure 24.5d shows that two wine 
glasses with different sculpture, one made from crystal and the other from plastic, are allocated 
in the same warehouse. The unique warehouse-level scenario will not be discussed any further in 
this chapter because we are only looking at a large warehouse distribution where most items move 
together as a group. Therefore, most items from the same type of manufacturing will stick together 
until they are deployed to smaller retailer. 


24.3.1.3 Joined Q-ary Tree 


The joined approach [44] is a combination of Q-ary trees, specifically 2-ary and 4-ary trees, which 
have been identified to be the best Q-ary trees [46,47]. The joined Q-ary tree employs the right 
combination of Q-ary trees for each specific scenario. Assuming that most items from the warehouse 
have massive movements, the first few bits of the EPC will be identical and the remaining bits 
will be very similar. In order to optimize the performance of the joined Q-ary tree, the right 
separating point (SP) between the two Q-ary trees needs to be configured. This procedure will 
further reduce the accumulative bits from the reader’s queries and improve the robustness of the 
overall identification process. 
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Figure 24.6 Sample of: (a) a Naive 4-ary tree; (b) a naive 2-ary tree, and (c) a joined Q-ary tree. 


Figure 24.6 shows the example of (a) Naive 2-ary, (b) Naive 4-ary, and (c) joined Q-ary tree. 
Joined Q-ary tree bonded both 2-ary and 4-ary trees together and applied to specific bits of EPC 
depending on how identical or unique they are. 


24.3.1.4 EPC Bits Prediction and Classification 


In warehouse distribution environment according to unique item-level and unique container-level 
scenarios, it is known that the first 36-bits of EPC (header and GMN) are definitely identical. 
However, 24-bits of OC can be both identical and unique for all tags, depending on how many 
pallets existed within one interrogation zone. For example, if there are 5 pallets of 12 cases each in 
the interrogation zone, there will be 5 different OC and 60 unique SN for all 60 items (cases). 

Since OC involved 24-bits of EPC (allow 16,777,215 unique tags) but only 5 unique OC are 
needed, we must calculate a certain number of unique bits needed in order to apply the right Q-ary 
tree. This also applies to SN that contains 36-bits of string. Assuming that the EPC pattern is used, 
not all 36-bits of these strings will be unique. 

Table 24.3 shows a formal structure for bits classification of GID-96 bits EPC. It can be seen 
that the identical bits of EPC always equal to 36-bits for the first 36-bits of EPC. This includes 
8-bits of header and 28-bits of GMN, which are always the same for all tags. For OC, 24-bits are 
available where unique bits within object class (UOC) can be predicted using Equation 24.3. In 
addition, unique bits within serial number (USN) with 36-bits can also be predicted using the same 
equation. 

Our method is executed based on the assumption that the approximate number of tags (pallets, 
cases) is known prior to the identification process. This information is needed for unique bits 
calculation: UOC and USN from Table 24.3. However, in most circumstances, number of tags is 


Table 24.3 Formal Structure of Bits Classification of 
EPC GID-96 Bits 


Length | Identical | Unique 
Header 8 8 0 
General manager number 28 28 0 
Object class 24 24 - UOC | UOC? 
Serial number 36 36 - USN | USNP 


a UOC the number of unique bits within object class. 
b USN is the number of unique bits within the serial number. 


Fusion of Pre/Post-RFID Correction Techniques to Reduce Anomalies m 543 


usually unknown until the first query is issued by the reader. Therefore, UOC and USN of joined 
Q-ary tree can be initially set to zero and after the first round of identification, these two parameters 
can be computed. 

Joined Q-ary tree adaptively adjusts their tree branches at specific SP. These SP are configured 
according to identical bits and unique bits within an EPC data. In order to calculate the estimated 
number of unique bits within an EPC, we need the average number of tags within an interrogation 
zone, and then to apply the following equation: 


B = log, (N) (24.3) 


where 
N is the number of tags 
B is the unique bits of EPC 


24.3.2 Probabilistic Anti-Collision Approaches 


This section comprises the mathematic fundamental for probabilistic anti-collision schemes, and the 
foundations of the proposed PCT. 


24.3.2.1 Mathematic Fundamental for ALOHA-Based Tag Estimation 


In the framed-slotted ALOHA-based probabilistic scheme, to estimate the number of present tags, 
binomial distribution is a good fundamental method. For a given initial Q in a frame with F slots 
and 7 tags, the expected value of the number of slots with occupancy number x is as follows: 


1 x 1 n—X 
ay =nx Cil = L== 
F F 


Therefore, the expected number of empty slot e, successful slot s, and collision slot c is given by 
the following equations: 


Thus, the system efficiency (E) is defined as the ratio between the number of successful slot and 
the frame-size, as per the following equation: 


It has been proven that the highest efficiency can be obtained if the frame-size F is equal to the 
number of tags 7, provided that all slots have the same fixed length: 


F (optimal) =n 


Therefore, we make the assumption that by keeping the number of tags close to the available 
frame-size, the optimal performance efficiency can be obtained. According to literatures, it is 
possible to achieve the theoretically optimal efficiency of 36.8% in ALOHA-based systems. 
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24.3.2.2 Probabilistic Cluster-Based Technique 


The PCT [3] employs a dynamic probabilistic algorithm concept and uses a group-splitting rule to 
split backlog into group if the number of unread tags is higher than the maximum frame-size. 

The PCT approach first estimates the number of backlog, or the remaining tags, within the 
interrogation zone. If the number of backlog is larger than the specific frame-size, it splits the 
number of backlog into a number of groups and allows only one group of tags to respond. 
PCT approach derived new rules using particular equations, according to the optimal system 
efficiency obtained for specific number of tags. We first conducted an experiment to acquire 
optimal frame-size for specific number of tags as shown in Figure 24.7. It can be seen that the 
optimal system efficiency achieved by the probabilistic ALOHA method is approximately 38% and 
the optimal number of tags is close to the maximum frame-size. Efficiency is calculated as shown in 
Equation 24.4: 


S 


where 
S is the number of successful slots 
C is the number of collision slots 
E is the number of empty slots 


Efficiency 


0 64 128 192 256 320 384 448 512 576 640 704 
Number of tags 


—— FS=32 AH FS=64 =*= FS=128 == FS=256  —% FS=512 


Figure 24.7 Performance efficiency of different frame-size on different number of tags. 
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24.3.2.3 PCT Rules 


The PCT [3] method employs a dynamic probabilistic algorithm concept and uses the group- 
splitting rule to split backlog into group if the number of unread tags is higher than the maximum 
frame-size. 

The PCT approach first estimates the number of backlog, or the remaining tags, within the 
interrogation zone. If the number of backlog is larger than the specific frame-size, it splits the 
number of backlog into a number of groups and allows only one group of tags to respond. 
The reader then issues a “Query,” that contains a “Q” parameter to specify the frame-size 
(frame-size F(min) = 0; F(max) = 20 — 1). Each selected tag in the group will pick a random 
number between 0 and 20 — 1 and place it into its slot counter. Only the tag that picks zero 
as its slot counter responds to the request. When the number of estimated backlog is below 
the threshold, the reader adjusts the frame-size without grouping the unread tags. After each 
read cycle, the reader estimates the number of backlog using the PTES algorithm and adjusts its 
frame-size. 

Table 24.4 shows the PCT rule. For instance, if the number of backlog equals 900 tags, the 
PCT algorithm will split the unread tags into three groups of Q8 (28 — 1 = 256). 


Table 24.4 PCT Rule—Number of Unread 
Tags, Optimal Frame-Size (FSA and FSB), and 
Number of Groups (A and B) 


PCT Rule 
Backlogs FSA | Group A | FSB | Group B 
1233 to 1408 | 256 4 — — 
1057 to 1232 | 256 3 128 1 
881 to 1056 | 256 3 = — 
705 to 880 256 2 128 1 
529 to 704 256 2 — - 
353 to 528 256 1 128 1 
177 to 352 256 1 = = 
89 to 176 128 1 — — 
45 to 88 64 1 = = 
23 to 44 32 1 = = 
12 to 22 16 1 = = 
6 to 11 8 1 — — 
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24.3.3 Comparative Analysis of Deterministic 
and Probabilistic Techniques 


In this chapter, we have empirically compared the performance of the joined Q-ary tree against the 
PCT anti-collision approach because our deterministic and probabilistic methods have outperformed 
existing techniques in their own grounds [3,44,48]. The joined Q-ary tree uses less resource, has 
no complexity in implementation, and needs low reader power and memory consumption, because 
it does not need to keep memory during identification. On the other hand, the PCT works well 
in arbitrary situation, minimizes resource used, and increases system efficiency, without the need 
for complex implementation. We believe that this comparative analysis is necessary to identify the 
best overall method for specific circumstances. 


24.3.3.1 Data Set 


For joined Q-ary tree anti-collision approach, there are 10 pallets of inventories in test case A, with 
each pallet containing 100 cases/tags, giving a total of 1000 tags. Similarly, test case B also contains 
1000 tags, but each pallet only holds 50 cases/tags. 


m Test case A: joined Q-ary tree with 100 tags per pallet Joined(100))—10 pallets, 100 cases 
each, total 1000 tags 

a Test case B: joined Q-ary tree with 50 tags per pallet (Joined(50))—20 pallets, 50 cases each, 
total 1000 tags 


For probabilistic anti-collision approach, we considered different number of tags, from 100 to 
1000 tags. For each identification round, optimal tunable initial Q is applied. 


24.3.3.2 Results 


From the empirical study, we have investigated the performance of our proposed joined Q-ary 
tree and PCT. Figure 24.8a illustrates that the difference in performance between each method 
increased with the increased number of tags, this has particularly become visible when examining 
1000 tags. The overall number of slot results have shown that the joined Q-ary tree with 100 tags per 
pallet (joined(100)) has obtained the minimal number of slots throughout the whole experiment, 
which also obtains the shortest identification time required. In contrast, the joined Q-ary tree with 
50 tags per pallet (joined(50)) performed poorly compared with the joined(100) and PCT. These 
results has proven that the selection of the EPC pattern has a large impact on the performance of 
the joined Q-ary tree. When the chosen EPC pattern involved has a very small group of tags (such 
as 50 tags per pallet), the performance of joined Q-ary tree cannot be optimized. 

Figure 24.8b shows the performance efficiency of all methods. It can be seen that the Joined(100) 
achieved close to 47% efficiency once the number of tags reach 1000. Additionally, we can see 
that the performance efficiency of both the joined(100) and joined(50) methods keeps increasing 
in accordance to the number of tags. In contrast, the PCT cannot achieve a performance efficiency 
higher than 38%. By examining Figure 24.8, it can be assumed that the efficiency of the joined 
Q-ary tree will increase slowly once the number of tags within the interrogation zone becomes very 
high. For the joined(50), if the number of tags keeps increasing, it is possible that the performance 
efficiency will achieve the same level as PCT. 
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Figure 24.8 Comparative analysis of joined Q-ary tree vs. PCT: (a) number of slots comparison 
and (b) performance efficiency. 


From the comparative analysis, we have identified certain properties of importance for anti- 
collision methods in general. For deterministic methods, we have discovered that there are impacts 
from similar EPC patterns, the number of tags within one group of the EPC pattern, and the overall 
number of tags within the interrogation zone. For probabilistic methods, we have determined that 
the performance of the anti-collision technique depends on the initial frame-size (or the Q value) 
specification, the accuracy of backlog prediction techniques, and the overall number of tags within 
the interrogation zone. 


24.4 Deferred Cleaning Approaches 


In the following section, we discuss the cleaning of the stored RFID observations after they have 
been placed inside the data warehouse. Unlike the anti-collision techniques, it is not only possible 
to restore missing records in the database, but also to eliminate wrong and duplicate data. Thus, 
we have divided this section into both false-negative and false-positive cleaning sections to correct 
missing and wrong/duplicate anomalies, respectively. In each of the methodologies, we apply a 
Bayesian network, neural network, and non-monotonic reasoning classifiers, and then compare 
each approach to determine the highest performing methodology. 


24.4.1 False-Negative Cleaning 


To accurately correct missing readings, we have constructed an advanced data analysis methodology 
coupled with high-level intelligence to correctly decipher the most likely candidates of observations 
to be returned into the data set. Specifically, the concept we have introduced will intelligently 
analyze the missing data anomaly and use the Bayesian network [49,50], neural network [51], and 
non-monotonic reasoning [52-54] classifiers to find the correct observations to load back into the 
database. This will include an outline the motivation and scenario considered in this work followed 
by a description of the system architecture of our approach. These discussions will be followed 
by the database structure that houses the RFID observations and all assumptions made toward 
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our methodology. We then present the results obtained from our experimental evaluation before 
summarizing our findings. 


24.4.1.1 System Architecture 


We have divided our system's architecture into three core components. The first is designed to 
analyze the data where the missed reading occurred which we have named the analysis phase. 
The data discovered in this analysis phase are then passed onto the intelligence phase where 
the correct permutation is selected. After the resulting data set has been chosen, the Zad- 
ing phase will complete the program's cycle by inputting the information back into the data 
warehouse. 

Analysis phase: The analysis phase consists of the tool locating missed readings and identifying 
essential data about the anomaly. The first process is to divide the tags into “tag streams” (Definition 
24.1) as seen in Figure 24.9. These tag streams include chronicle information relating only to one 
individual tag. From these tag streams, certain information is ascertained relating to the nature 
of the false negative anomaly. This includes finding the reader locations of the observations two 
readings before and directly before the anomaly (4 and $, respectively) and the two readings 
directly after the reader (c and d, respectively). Additionally, the shortest path between readings 
b and c using the map data is found. The total missing readings calculated via the number of 
missing timestamps (7), and the amount of observations within the shortest path (s), is then 
calculated. 


Definition 24.1 Tag stream: We define tag streams as individually analyzed streams for one tag 
from the mass amount of readings. 


The analytical information obtained from our approach includes detecting if readers a and b 
are equal (a == bh); determining if readers b and c are relatively close to each other according to 
the map data (b <> c); discovering if the readers 6 and c are equal (b == c); finding if readers 
c and d are equal (c == 4), and discovering if is equal to, less than, or greater than s minus 
two (n == (s — 2)), (n < (s — 2)), (n > (s — 2)). The reason as to why we subtract two from 
value s is that the shortest path will include the values of b and c which are not necessarily a part 
of the missing gaps of knowledge. All of these analytical Boolean variables are then passed on to 
the correction phase which utilize it to seek out the most ideal imputed reader values. We utilize 
four main arithmetic operations to obtain this binary analytical information. These include the 
equivalent symbol ==, the less < and greater than > symbols, and the <> symbol we have elected 
to represent geographical proximity. The rationale as to why s is always having two taken away 
from it lies in the fact that the shortest path always includes the boundary readers b and c, which 
are not included within the 7 calculation. 


False-negative tag stream 


Figure 24.9 A visual representation of how we analyze one tag at a given moment within a tag 
stream. 
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Permutation 2: — 4 diia 
Permutation 3: — 4 diac 
Permutation 4: — 4 di E —— 
Permutation 5: — 4 AA] e A 


Figure 24.10 An illustration of what reader values are placed into each false-negative anomaly 
for each of the five different permutations. 


Intelligence phase: The intelligence phase occurs when the various permutations of the missing 
data are generated as candidates to be restored in the data set. The five different permutations that 
are been generated and depicted in Figure 24.10 are described as follows: 


Permutation 1: All missing values are replaced with the reader location of observation 6. 
Permutation 2: All missing values are replaced with the reader location of observation c. 

m Permutation 3: The shortest path is slotted into the middle of the missing data gap. Any 
additional missing gaps on either end of the shortest path are substituted with values 4 for 
the left side and c for the right. 

m Permutation 4: The shortest path is slotted into the latter half of the missing data gap. Any 
additional missing gaps on the former end are substituted with value 0. 

m Permutation 5: As the anti-thesis of Permutation 4, the shortest path is slotted into the 
former half of the missing data gap. Additional missing gaps found at the latter end of the 
missing data gap are substituted with value c. 


After the data analysis and permutation formations are completed, all relevant information 
found in the analytical phase are treated as a feature set definition and passed into either the 
Bayesian network, neural network, or the non-monotonic reasoning engine. The various classifiers will 
then return the permutation that has been found to best suit the missing data. With regard to 
the Bayesian network, all weights inside the network are first created with small random numbers. 
After this, we utilize a genetic algorithm to train these weights based on a training algorithm of 
previously correct permutations based on each variation of the feature set definitions. The resulting 
network configuration from this training will then provide the optimal permutation of readings to 
be inserted back into the data warehouse. 

The ANN accepts seven binary inputs to reflect the analysis data (a == 6, c == d, etc.) 
and has five binary outputs to reflect that the permutations that have been found. The ANN also 
includes nine hidden units found in one hidden layer. We specifically chose this configuration as 
we have found that there should be a larger number of hidden nodes than inputs and one layer 
would be sufficient for our network at this moment. If we chose to extend the complexity of our 
system, we may wish to also either increase the number of hidden units, layers, or both. 

We have also applied a momentum term and learning rate whose values are 0.4 and 0.6, 
respectively, to avoid local minima and network paralysis. Each input and output value will not be 
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1 and 0 as this may not yield a very high classification rate. Instead, we will use the values of 0.9 
and 0.1, respectively. We will set the stopping criteria as both the RMS error threshold when it 
reaches below 0.1 and 1000 iterations. We have also utilized the sigmoidal activation function to 
derive our outputs. 

The NMR classifier is based on the rules we have created as displayed in Tables 24.5 through 
24.9. Each of the rules present within the tables are combinations found from the analytical data 
joined by “and” statements (A) which have been gathered within the analysis phase. Within the 
logic engine build, the precedence of the rules corresponds to the larger number of the rule (e.g. 
rule 17 will beat rule 4 in Table 24.6). In the event that more than one permutation has been 
found to be ideal in a given situation, we use the following hierarchical weighting: permutation 
3 > permutation 1 > permutation 2 > permutation 4 > permutation 5. In the unlikely case 
where no conclusions have been drawn from the NMR engine, permutation 3 will be elected as 
the default candidate due to it having perfect symmetry within the imputed data. This ordering 
has been configured to be the most accurate conclusion assuming that the amount of consecutive 
missed readings are low due to the randomness of the anomalies. 


Table 24.5 Table Depicting the Non-Monotonic Reasoning Rules Used to 
Create the Permutation 1 Logic Engine 


Rule No. Rule Conclusion 
1 b==C ~Perm1 
2 b==c/\n < (s—2) ~Perm1 
3 b==cAn== (s—2) ~Perm1 
4 a==b Perm1 
5 c == d Nn == (s—2) ~Perm1 
6 c == d An > (s-2) ~Perm1 
7 c == d Nb & c An == (s-2) ~Perm1 
8 c==d Nb < c An > (s-2) ~Perm1 
9 a==bAb=>c Perm1 
10 a == b Nb < c An == (s-2) Perm1 
11 a == b Nb <& c Ab == c N ~ c == d An == (s-2) Perm1 
12 c==d ~Perm1 
13 c==d Abec ~Perm1 
14 a == b Nb <& c An > (s-2) Perm1 
15 a == b Nb == c Nb & c N~ c == d An > (s-2) Perm1 
16 ~bec ~Perm1 
17 n < (s—2) ~Perm1 
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Table 24.6 Table Depicting the Non-Monotonic Reasoning Rules Used to 
Create the Permutation 2 Logic Engine 


Rule No. Rule Conclusion 
1 b==C ~Perm2 
2 b == c An < (s—2) ~Perm2 
3 b == c Nn == (s-2) ~Perm2 
4 c==d Perm2 
5 a == b Nn == (s-2) ~Perm2 
6 a == b Nn > (s-2) ~Perm2 
7 a == b Nb < c Nn == (s-2) ~Perm2 
8 a== b Nb & c An > (s-2) ~Perm2 
9 bocAc==d Perm2 
10 bocAc==dAn== (s-2) Perm2 
1 ~a == b Ab & c Ab == c Nc == d An == (s-2) Perm2 
12 a==b ~Perm2 
13 a==bAb=c ~Perm2 
14 bocAc==dAn> (s-2) Perm2 
15 ~a == b Ab == c Ab & c Nc == d An > (s-2) Perm2 
16 ~be c ~Perm2 
17 n < (s—2) ~Perm2 


Table 24.7 Table Depicting the Non-Monotonic Reasoning 
Rules Used to Create the Permutation 3 Logic Engine 


Rule No. Rule Conclusion 
1 a==bAc==d Perm3 
2 ~a==b/\~c==d ~Perm3 
3 a==bAc==dAn> (s-2) Perm3 
4 ~a == b N~ c == d A~n > (s—2) ~Perm3 
5 n == (s-2) Perm3 
6 b==c ~Perm3 
7 a == b Nc == d An == (s—2) Perm3 
8 n < (s—2) ~Perm3 
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Table 24.8 Table Depicting the Non-Monotonic 
Reasoning Rules Used to Create the Permutation 
4 Logic Engine 


Rule No. Rule Conclusion 
1 c==d ~Perm4 
2 b==C ~Perm4 
3 a==b Perm4 
4 ~a==b ~Perm4 
5 n> (s—2) Na == b Perm4 
6 ~n > (s—2) ~Perm4 
7 ~a == b N ~n > (s—2) ~Perm4 


Table 24.9 Table Depicting the Non-Monotonic 
Reasoning Rules Used to Create the Permutation 
5 Logic Engine 


Rule No. Rule Conclusion 
1 a==b ~Perm5 
2 b==C ~Perm5 
3 c==d Perm5 
4 ~c==d ~Perm5 
5 n> (s—2) Nc == d Perm5 
6 ~n > (s—2) ~Perm5 
7 ~c==d/\~n> (s-2) ~Perm5 


Loading phase: The loading phase consists of the selected permutation being uploaded back into 
the data storage at the completion of the żntelligence phase. The user will have the opportunity to 
either elect to load the missing data into the current data repository or to copy the entire data set 
and only modify the copied data warehouse. This option would effectively allow the user to revisit 
the original data set in the event that the restored data are not completely accurate. 


24.4.1.2 Experimental Evaluation 


Within the following section, we have included a thorough description of the setup of the experi- 
mentation used in our methodology. First, we discuss the environment used to house the programs. 
This is followed by a detailed discussion of our experimentation including the four experiments 
we performed and their respective data sets used. These experiments include the training of the 
three various classifiers and then taking the highest performing configurations to compare against 
each other. 
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Environment: Our methodology has been coded in the C++ language and compiled with 
Microsoft Visual Studio C++. The code written to derive the lookup table needed for the non- 
monotonic reasoning data has been written in Haskell and compiled using Cygwin Bash shell. 
All programs were written and executed on Dell machine with the Windows XP service pack 3 
operating system installed. 

Experiments: We have conducted four experiments to adequately measure the performance of 
our methodology. The first set of experimentations we conducted involved finding a Bayesian 
network that performs the highest clean on REID anomalies. After this, we investigated the highest 
performing neural network configuration in our second experiment. The third experiment con- 
ducted was to determine which of the CDL formulae performs most successfully when attempting 
to correct large amounts of scenarios. The training cases used in the each of these experiments 
consisted of various sets of data consisting of ambiguous false negative anomaly cases. 

The fourth experiment we conducted was designed to test the performance of our selected highest 
performing Bayesian network, neural network, and non-monotonic reasoning logic configurations 
to determine which classifier yielded the highest and most accurate clean of false-negative RFID 
anomalies. The reason as to why these techniques were selected as opposed to other related work is 
that only other state-of-the-art classifying techniques may be compared with respect to seeking the 
select solution from a highly ambiguous situation. 

The fourth experiment testing sets included four data repositories consisting of 500, 1,000, 
5,000, and 10,000 ambiguous false-negative anomaly cases. We defined our scoring system as if 
the respective methodologies were able to return the correct permutation of data that had been 
previously defined. All data within the training and testing set have been simulated to emulate real 
REID observational data. 

Database structure: To store the information recorded from the RFID reader, we utilize portions 
of the “Data model for RFID applications” DMRA database structure found in Siemens Middleware 
software [55]. Additionally, we have introduced a new table called MapData designed to store the 
map data crucially needed within our application. Within the MapData table, two reader IDs are 
stored in each row to dictate if the two readers are geographically within proximity. The structures 
of the two tables we are using in experimenting include the following: 

OBSERVATION (Reader_id, Tag_value, Timestamp) 
MAPDATA (Reader1_id, Reader2_id) 

Assumptions: We have made three assumptions that are required for the entire process to be 
completed. The first assumption is that the data recorded will be gathered periodically. The second 
assumption we presume within our scenario is that the amount of time elected for the periodic 
readings is less than the amount of physical time needed to move from one reader to another. This 
is important as we base our methodology around the central thought that the different readings 
will not skip over readers that are geographically connected according to the MapData. The final 
assumption we make is that all readers and items required to be tracked will be enclosed in a static 
environment that has readers which cover the tracking area. 


24.4.1.3 Results and Analysis 


To thoroughly test our application, we devised four different examinations which we have labeled the 
Bayesian network, neural network, non-monotonic reasoning, and false-negative comparison experiments. 
In the Bayesian network false-negative experimentation, we conducted various investigations into 
bothstaticand dynamicconfigurations to attemptto achieve the highest cleaning performance. Similar 
to the Bayesian network experiment, the neural network false-negative experimentation included 
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finding the highest training algorithm to configure the network and obtain the highest performing 
clean. The non-monotonic reasoning false-negative experiment compared the cleaning rate of each 
of the CDL formulae. The highest performing non-monotonic reasoning setup was then compared 
to Bayesian and neural network approaches to find the highest performing cleaning algorithm. 


24.4.1.3.1 Bayesian Network False-Negative Experiment 


We decided to augment the Bayesian Network with a genetic algorithm to train the network based 
on a test set we developed. The training experiment consisted of the utilization of the genetic 
algorithm with 250 generation iterations to determine the fittest chromosome. To this end, we 
trained and compared the genes of chromosomes where 100-1000 chromosomes were in the 
population and incremented by 100 in each sequential experiment. The data set used to determine 
the fitness of the chromosomes contained every permutation possible with the analysis and its 
correct permutation answer. From the results, we have found that the chromosome which was 
the fittest resulted when 500 chromosomes were introduced into the population. Unlike the other 
experimentation we have performed in this research, the Bayesian network has been measured by 
the number of inserts that were exactly correct as opposed to measuring how correct the entire data 
set is. This results in the Bayesian network achieving a relatively low cleaning rate as permutations 
that are not an exact match with the training results will be counted as incorrect when the resulting 
imputed data may actually be correct in the data set. 


24.4.1.3.2 Neural Network False-Negative Experiment 


The neural network configuration experiment has the goal of seeking out the training algorithm that 
yields the highest clean rate. The two training methods used for comparison are the BP and genetic 
algorithms. The training set utilized in this experiment is comprised of every possible combination 
of the inputs and their respective outputs that amount to a total of 128 entries. We have conducted 
tests upon three different training algorithm setups: The first is the BP algorithm and the other 
two are genetic algorithms that use 20 and 100 chromosomes to find the optimized solution. 
Additionally, we conducted each experiment in three trials to further generalize our results. The 
algorithm that had the hardest time finding the correct configuration was the BP algorithm when it 
iterated for 50 and 100 times, earning it 1.56% classification rate. The trainer that performed the 
best was the genetic algorithm, both using 20 and 100 chromosomes in every test and the iteration 
number excluding the 20 chromosome configuration which lasted for 5 generations. 

The analysis we performed on these results consisted of graphing the average of our findings 
into a bar graph to illustrate the difference in algorithms. On average, the neural network genetic 
algorithm performed the best, obtaining an 87.5% cleaning rate. Unfortunately, as discovered 
previously in the results, the BP algorithm performed the weakest within the three algorithms. We 
believe the poor results of the BP algorithm was due directly to over-training the network. We 
noticed that there was also a particularly low result when attempting to train this algorithm for 50 
attempts in trial 2. However, it wasn’t until the 100 iteration training that we could clearly see the 
effects of the training routine on the average cleaning rate [51]. 


24.4.1.3.3 NMR False-Negative Experiment 


We created the third experiment with the goal of determining which of the five CDL formulae 
would be able to clean the highest rate of highly ambiguous missing RFID observations. We did 
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this by comparing the cleaning results of the pi, «, 7, B, and 5 formulae on various training cases. 
There were three training sets in all with 100, 500, and 1000 ambiguous false-negative anomalies. 
Additionally, at the completion of these experiments, the average was determined for all three test 
cases and was used to ascertain which of the five formulae would be used within the significance 
experiment. 

We have found that the highest average achieving formula has been found to be a (Alpha). This 
is probably due to the fact that it discovers cases in which both the B and 7t formulae agree upon, 
thereby increasing the intelligence of the decision. Also of note is that the disjunction of B and 7 
formulae shown within ô achieves a relatively high average cleaning rate. The lowest performing 
average cleaning rate has been found to be B, which is probably due to its nonacceptance of 
ambiguity when drawing its conclusion. We believe it is crucial for the cleaner to have a low level of 
ambiguity when drawing its conclusions as the problem of missed readings needs a level probability 
to infer what readings need to be replaced. As stated previously, we have chosen the « formula as 
the highest performing cleaner to be used within the classifier comparison experiment [54]. 


24.4.1.3.4 False-Negative Comparison Experiment 


The goal of our fourth experimental evaluation was designed to put three classifiers through a 
series of test cases with large amounts of ambiguous missing observations. The three different 
classifications techniques included our NMR engine with CDL using the « formula compared 
against both Bayesian and neural networks with the highest performing configurations obtained 
from previous experimentations. We designed the experiment to have an abnormally high amount 
of ambiguous false-negative anomalies consisting of 500 and 1000 test cases to thoroughly evaluate 
each approach. Following the conclusion of these experiments, we derived the average of each 
technique to find the highest performing classifier [54]. 

The results of this experimentation shown in Figure 24.11 have shown that the neural network 
obtained a higher cleaning average than that of both the Bayesian network and NMR classifiers 
with 86.68% accuracy. The lowest performing cleaner was found to be the Bayesian network. 
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Figure 24.11 The revised results of the Bayesian network, neural network, and non-monotonic 
reasoning classifiers when attempting to clean an evenly spaced amount of false-negative 
anomalies. 
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From these findings, we have determined that the Neural Network has provided the highest 
accuracy when attempting to clean ambiguous false-negative anomalies. It is interesting to note 
that the probabilistic approach has actually outperformed the deterministic methodology in the 
case of imputing missing data. We believe that this is due to the fact that there would be a level 
of ambiguity and probability needed to be introduced when correcting missing data due to the lack 
of information available to the classifier. In this case, a methodology that attempts to investigate the 
validity of observations not normally considered by deterministic approaches would yield a higher 
cleaning rate. 


24.4.2 False-Positive Cleaning 


In this section, we have modified our REID anomaly management system to identify and eliminate 
false-positive observations. We will first review the motivation and architecture of the novel 
concept. This includes the feature set definition, Bayesian network, neural network, non-monotonic 
reasoning, and loading phases. Next, we provide details of the ideal scenario to design our system 
most effectively and any assumptions we have made to ensure that the algorithms operates correctly 
will be listed. Finally, we provide an experimental evaluation of each of the classifier performance 
and discuss which provides the highest accuracy when correcting wrong and duplicate anomalies. 


24.4.2.1 System Architecture 


The design of our system has been broken into three sections: the feature set definition phase, the 
classifier phase, and the modification phase. The feature set definition phase is the first process that 
is conducted within our application in which raw data are searched and sorted to find suspicious 
readings and the circumstances surrounding each of these observations. The classification is where 
the system deviates between three different classifiers—the Bayesian network, neural network, or 
NMR. Each classifier has one goal, which is to determine if the flagged reading should be deleted 
or kept within the data set. This decision is based solely upon the input gathered from the feature 
set definition phase. After each classifier has determined the validity of the observation, it will then 
pass the decision onto the modification phase that will either delete or keep the value being passed 
to it. 


24.4.2.1.1 Feature Set Definition Phase 


The first stage of the program is the feature set definition phase whose main goal is to analyze the 
data to discover suspicious readings and investigate key characteristics surrounding the flagged 
observation. Initially, this phase breaks the tag readings into tag streams designed to analyze 
the route of one tag. A tag will be flagged suspicious if the difference in timestamps exceed the 
user-defined duration it should take to reach the location, or if the geographical locations of the 
readers are not within proximity. To determine the geographic validity of the readers, the program 
utilizes a table named MapData that is constructed by the user and reflects the geographic layout 
of all adjacent readers within the static environment. As illustrated in Figure 24.12, there are five 
observational values that are ascertained: a, b, x, c, and d. The values of observations a and 6 are 
the readings taken two or one positions, respectively, before the suspicious reading x. The c and d 
readings are the observations that have been recorded once and twice after the suspicious reading. 
From all these observations, the timestamp and the location are all recorded and used in further 
analysis. 
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Figure 24.12 Graphical representation of a tag stream with observations that are to be examined 
highlighted as a, b, x, c, and d. 


After the values of a, b, x, c, and d with their respective timestamps and locations have 
been found, the feature set definition phase further investigates key characteristics of the data. 
The characteristics comprise 10 different binary mathematical operations; however, additional 
characteristics may be added by the user. Each of the characteristics contains spatial and temporal 
information regarding the observations before and after the suspicious readings. With regards to 
the proximity of timestamps, we have utilized the value of half a second. This time value may be 
altered to better suit the application for which it is designed. The characteristics we discovered are 


as follows: 

m bloc > x.loc 

m c.loc > x.loc 

m b.time == x.time 
m c.time == x.time 
m bloc == x.loc 

m c.loc == x.loc 

m a.loc <> x.loc 

m d.loc > x.loc 

m b.time > x.time 
m c.time > x.time 


The data used in these characteristics are five different values that have two sub-values each. 
The five values include the observations of a, 6, x, c, and d, where each have the time (time) 
and location (loc) for each value stored. The characteristics our methodology uses in analysis 
include when values are within certain proximity which is represented as <>, or are equivalent 
which is represented as ==. It is important to note that the function which states that the two 
values are within proximity of each other have different meanings between the location and time. 
With regards to the location information (loc), the proximity is determined by the “MapData” 
information, whereas the time proximity refers to the temporal interval between two observations 
being within the user-defined time value of each other (i.e. two observations being with 5 seconds 
of each other). After all these characteristics have been gathered, they are passed onto the various 
classifying methodologies as inputs to determine whether or not the flagged item should remain 
within the data set. 


24.4.2.1.2 Classifier Phase: Bayesian Network 


The first option that the classifier phase can utilize is the Bayesian network. In this example, we have 
considered the Bayesian network to have 10 inputs that correspond to the analytical characteristics 
found at the end of the feature set definition phase. Using these 10 inputs based on the weights 
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it has obtained through training the Bayesian network, it will determine whether the flagged 
observational reading should be kept within the database. This will result in one output known as 
the keep_value which will be set to either true or false. This keep_value output will be passed to 
the modification phase at the end of this process at which point the entire application will repeat 
each time a suspicious reading is encountered. We also set all binary input numbers from 0 and 1 
to 0.1 and 0.9, respectively, to allow for higher mathematical functions to benefit from avoiding a 
multiplication of zero or one. 

We have chosen a genetic algorithm to train the Bayesian network weights based on the various 
test cases that may arise. The genetic algorithm will have the population of the chromosomes to 
determine the ideal number of chromosomes utilized for training purposes. The mutation rate of 
the genetic algorithm being utilized will be 1% for the top 10% chromosomes with regard to fitness 
and 5% for all other chromosomes. After the best weight configuration has been determined, the 
network will be utilized to compare it against the neural network and non-monotonic reasoning 
approaches. 


24.4.2.1.3 Classifier Phase: Neural Network 


ANN is the second option we have chosen for the classifier phase that utilizes weighted neurons 
to determine the validity of the flagged value. Like the Bayesian network, this ANN will use the 
10 inputs gathered from the feature set definition phase to pass through the network and obtain 1 
output. The network comprises a single hidden layer with 11 hidden nodes resulting in 121 weights 
between all the nodes. We specifically wanted to choose more hidden units than inputs and only 
one layer as we have found that multilayered networks do not necessarily enhance the performance 
of the classifier. 

We have also set the momentum and learning rates to 0.4 and 0.6, respectively, and have 
utilized a sigmoidal activation function. Additionally, as with the Bayesian network, we shall use 
the numbers 0.1 and 0.9 rather than the binary numbers of 0 and 1, respectively. Two prominent 
training algorithms have been utilized to properly configure the neural network. The first is the 
BP algorithm while the second is the genetic algorithm, that has also been utilized within the 
Bayesian network. Both algorithms will use a limited amount of iterations as stopping criteria for 
the training. 


24.4.2.1.4 Classifier Phase: Non-Monotonic Reasoning 


The final classifier we have utilized within our implementation is NMR logic engines. The actual 
algorithm utilizes a series of rules that we have created based upon the input analysis variables 
obtained from the feature set definition phase. From this, the logic engines determine the correct 
course of action to either keep the value or not based on the different levels of ambiguity we enforce. 
The rules utilized within the logic engine may be examined in Table 24.10. The four symbols that 
are used to interact with the values within the rules are the logic AND operator /\, the negative 
operator ~, the equal operator ==, and our use of the double arrow < to illustrate proximity 
between the two analysis variables. 

As a default case where neither keep_val nor ~keep_val are encountered, the logic engine will 
keep the flagged reading to avoid artificially introduced false-negative observations. Additionally, 
the order in which they have been written in this document is also the order of priority with regards 
to finding the conclusion. 
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Table 24.10 Table Containing all the Rules and Respective Conclusions Utilized 
in the Non-Monotonic Reasoning Engines 


Rule No. Rule Conclusion 
1 c.time > x.time A ~ c.loc == x.loc keep_val 
2 b.time > x.time A ~ b.loc == x.loc keep_val 
3 ~ b.loc <> x.loc A ~ c.loc <> x.loc A ~ a.loc > x.loc A ~ ~keep_val 
d.loc <> x.loc 
4 ~ b.loc = x.loc A ~ c.loc <> x.loc ~keep_val 
5 a.loc = x.loc A d.loc < x.loc keep_val 
6 b.loc = x.loc A c.loc > x.loc keep_val 
7 b.loc = x.loc A c.loc >, x.loc A ~ b.time == x.time A ~ keep_val 
c.time == x.time 
8 c.time == x.time ~keep_val 
9 b.time == x.time ~keep_val 
10 b.time == x.time A c.time == x.time ~keep_val 
11 b.loc = x.loc A cloc > x.loc A ~ b.time == x.time A ~ keep_val 
c.time == x.time 
12 b.loc == x.loc A c.loc == x.loc A b.time == ~keep_val 
x.time /\ c.time == x.time 


24.4.2.1.5 Modification Phase 


After each intelligent classifier has determined whether or not to keep or delete the flagged reading, 
it will pass it to the modification phase. After the decision has been received, the application will 
then delete the identified value in the original data warehouse. 


24.4.2.2 Experimental Results and Analysis 


In order to investigate the applicability of our concepts, we conducted four experiments. The 
first three were dedicated to finding the optimal configuration of each classifier, whereas the last 
focused on the comparison of the three classifiers. In this section, we describe the database structure, 
assumptions, and environment in which we conducted these experiments and an analysis of the 
experiments. Furthermore, we describe the experimental evaluation and present the results obtained. 
The first three are designed to determine the highest achieving configuration of the Bayesian 
network, neural network, and non-monotonic reasoning classifiers. In the fourth experiment, the 
highest achieving classifiers have been compared against each other to find which one achieves the 
highest cleaning rate. All experimentation was performed with an identical database structure and 
computer as the false-negative experimentation. 


560 m Intelligent Sensor Networks 


24.4.2.2.1 Environment 


As outlined earlier, there are four main experiments that were conducted using the methodology. 
The first experiment was designed to test the highest performing genetic algorithm when training 
the Bayesian network. For this experiment, the amount of chromosomes in the population was 
manipulated to find the highest performing number. The second experiment was designed to 
discover which training algorithm of either the BP or genetic algorithm obtained the highest 
cleaning rate. For this experiment, the amount of chromosomes were modified and compared with 
the BP algorithm to determine the highest achieving algorithm. The third experiment was designed 
to determine which formulae achieved the highest cleaning rate within the NMR approach. 

We specifically chose only to examine classifier techniques as the related work is not comparable 
due to either it not being able to clean ambiguous data or not using an automated process. The 
last experiment which was conducted took the highest achieving configurations of each of the 
classifiers and compared each methodology against the other. Four data sets were utilized for this 
experimentation, the first three were training sets in which 500, 1000, and 5000 scenarios were used 
to train the algorithms and find the optimal configuration. Each training set contained different 
scenarios to avoid the risk of over-fitting the classifiers. 

The second data set was three testing sets in which 1,000, 5,000, and 10,000 randomly chosen 
scenarios were selected and passed to the application to have the anomalies eliminated. Each of 
these testing sets contained feature set definitions generated within our sample scenario. After 
each of the training and testing experiments have been conducted, the average of cleaning rate of 
the experiments has been derived for each technique and used to identify the highest achieving 
method. 


24.4.2.2.2 Bayesian Network Experiment 


For our first experiment, we conducted an investigation into the optimal amount of chromosomes 
that are needed to clean the false-positive anomalies. To accomplish this, we created three Bayesian 
networks that have been configured using a genetic algorithm with 10, 50, and 100 chromosomes. 
Each network was trained for 10 generations to breed and optimize the configuration. With regard 
to the set of data being used for training, we used three different “training cases” comprising 500, 
1000, and 5000 false-positive anomaly scenarios. After these experiments were completed, the 
average of the three training cases was then extracted and, subsequently, used to determine the 
amount of chromosomes that were needed to achieve the highest cleaning rate. 

From our results, we have found that the configuration that used 10 chromosomes to train 
the network obtained the highest average cleaning rate. As a result, the Bayesian network using 
a genetic algorithm with 10 chromosomes will be utilized in the final experiment in which all 
three classifiers are compared. The lowest achieving configuration using 100 chromosomes tested 
upon 500 training cases was the Bayesian network. The highest achieving configuration has been 
found to be the configurations with 10 and 50 chromosomes against 500 and 1000 training cases, 
respectively. 


24.4.2.2.3 Neural Network Experiment 


The second experiment we conducted was in relation to determining the highest performing 
network configuration for a neural network to clean anomalous REID data. To do this, we trained 
the weights of the networks using the BP and the genetic algorithms with 10 (GA-10), 50 (GA-50), 
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and 100 (GA-100) chromosomes present. The performance of the resulting networks is determined 
based upon the correctness of the classification from three Training cases using 500, 1000, and 
5000 false-positive anomaly scenarios. Each configuration had been trained by 10 iterations or 
generations before the training experiment commenced. The main goal of this experiment was to 
determine the highest average achieving network trainer; thus, the average of the three training 
cases has also been found. 

From the experimental results, we have derived a general observation that the performance of 
the network is vastly improved with 50 and 100 chromosomes using the genetic algorithm. The 
highest performing average of the neural network has been found to be the genetic algorithms when 
trained with both 50 and 100 chromosomes. As such, we decided to use the genetic algorithm with 
100 chromosomes as the attempt to clean 500 training cases performed the highest. The lowest 
performing cleaning algorithm was the BP algorithm when attempting to clean 1000 training cases. 


24.4.2.2.4 NMR Experiment 


The main goal of the third experiment was to derive the highest performing NMR formula from 
the five different options used in CDL. With this in mind, the u (Mu), a (Alpha), 7 (Pi), B 
(Beta), and 5 (Delta) formulae were each trained using three training cases containing 500, 1000, 
and 5000 false-positive scenarios each. Like the previous two experiments, the average of each 
performing algorithm was ascertained and used to determine which of the five formulae would be 
utilized to proceed onto the final experiment. 

From the results, we have found that the highest performing formulae are u, o, and 7. 
In contrast, the B and ô formulae both performed the least cleaning. With regard to the final 
experimentation, we have chosen the 7t formula as it performed the highest and is the most likely 
to continue to perform highly. The reasons as to why we rejected the u and œ formulae lie in the 
fact that the u formula is strict in that it only accepts factual information and the « formula is 
connected directly to the (3. Hence, we determined that the 7 formula would be superior to the 
other formulae. 


24.4.2.2.5 Comparison Experiment 


The goal of the fourth experiment was to determine which of the three highest performing classifier 
techniques would clean the highest percentage of a large amount of false-positive RFID anomalies. 
The three classifiers used in this experiment included the Bayesian network trained by a genetic 
algorithm with 10 chromosomes (BN), the neural network trained by genetic algorithm with 100 
chromosomes (NN), and the 7t of the NMR. The classifiers were all chosen based upon the high 
performance found within the first three experiments previously discussed. Both of the Bayesian 
and neural networks had been trained for 10 generations before these tests were conducted. As 
opposed to the previous experiments, we determined that three “testing cases” containing 1000, 
5000, and 10,000 randomly chosen false-positive scenarios would be utilized to determine the 
highest performing classifier. To ascertain the highest performance, the average of each of the three 
test cases has been found from the results. 

The results of this experiment are depicted in Figure 24.13 where the amount of test cases 
and classifier has been graphed against the percentage of correctness. From these results, it can 
be seen that the NMR Engine achieves the highest average cleaning rate among other classifiers. 
The highest performing classifier has been found to be the NMR when attempting to clean 
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Figure 24.13 Experimental results of the fourth experiment designed to find the highest 
performing classifier when faced with a large amount of false-positive anomalies. 


1,000 test cases, whereas the lowest achieving classifier has been found to be the neural network 
when attempting to clean 10,000 test cases. 

The NMR engine outperformed the other classifiers in dealing with false-positive data due to 
the fact that it is a deterministic approach. The Bayesian and neural networks, by contrast, rely 
on a probabilistic nature to train their respective networks. The major drawbacks of this system 
are that it is specifically tailored for the static RFID cleaning problem; however, we believe that 
the same concept may be applicable to any static spatial-temporal data enhancement case study. 
With regard to applying our methodology to other applications where the environment is dynamic, 
the feature-set definition and NMR will need greater complexity to accommodate the change 
in anomalies. Although the test cases utilized in experimentation were small in comparison to 
the immense amount of RFID readings that get recorded in real-world systems, we believe our 
methodology would behave similarly upon larger data sets. 


24.5 Conclusion 


In this chapter, we have discussed the issues associated with anomalies present in captured RFID data 
and presented solutions to improve its integrity. First, we proposed deterministic and probabilistic 
anti—collision approaches, which increased the efficiency of the system performance. We also 
performed a comparative analysis of our two proposed deterministic and probabilistic anti—collision 
methods, and identified the benefits and disadvantages of each approach. We then proposed deferred 
cleaning approaches to be applied after the filtering of the data is complete to correct any ambiguous 
anomalies still present in the stored observations. For postcapture cleaning, we have integrated the 
Bayesian network, neural network, and non-monotonic reasoning classifiers to introduce a high 
level of intelligence to combat the ambiguous anomalies. 

First, we proposed a deterministic anti—collision algorithm using combinations of Q-ary trees, 
with the intended goal to minimize memory usage queried by the RFID reader. By reducing 
the size of queries, the RFID reader can preserve memories, and the identification time can be 
improved. We then introduced the probabilistic group-based anti—collision method to improve 


Fusion of Pre/Post-RFID Correction Techniques to Reduce Anomalies m 563 


the overall performance of the tag recognition process and provide a sufficient performance over 
existing methodologies. We also performed a comparative analysis of our proposed deterministic 
joined Q-ary tree and PCT, and identified the benefits and disadvantages of each approach for 
specific circumstances. Empirical analysis shows that the joined Q-ary tree method can achieve 
higher efficiency if the right EPC pattern is configured. However, for arbitrary situations where 
EPC pattern cannot be found, it is more preferable to use a probabilistic approach rather than the 
deterministic method. 

After the filtering has been completed with the anti—collision algorithms, we introduced the 
use of intelligent classifiers to discover ambiguous anomalies and decide upon the action to be 
taken to correct them. With regards to false-negative anomaly detection and correction, we have 
found that the highest performing classifier is the neural network. In contrast, we found that the 
NMR classifier achieved the highest cleaning rate when correcting the false-positive anomalies. 
From the results, we have seen that it is much easier to return a database to the highest integrity 
when attempting to clean false-positive anomalies as opposed to false-negatives. Additionally, we 
have found that the NMR classifier was able to achieve the highest false-positive anomaly clean 
as its deterministic nature makes it ideal to clean wrong and duplicate data, while probabilistic 
techniques would introduce an additional level of ambiguity. With regards to the false-negative 
anomalies, the neural network classifier gained the highest cleaning rate due to it being able to 
introduce a limited amount of ambiguity needed to find the ideal missing values. 
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25.1 RFID for Telemedicine 


25.1.1 Telemedicine: Is It Necessary? 


Cost associated with individual health care and the health-care industry is one of the most debated 
and discussed topics in recent times. According to the U.S. Centers for Medicare & Medicaid 
Services, the average cost per patient per stay has increased from $1851 in the year 1980 to $8793 
in the year 2005 [1]. The average annual expenditure per consumer for health care in 1 year 
increased from $1500 (approximate value) in the year 1990 to $2800 (approximate value) in the 
year 2007 [2] (Figures 25.1 and 25.2). 

The direct effect of the rising health-care costs can be observed in Figures 25.3 through 25.5, 
which project the trends in hospital emergency room visits, hospital inpatient days, and hospital 
outpatient days per 1000 population, respectively, during 1999-2009 [3]. The trend graphs clearly 


show decrease in inpatient numbers and an increase in outpatient numbers. A rise in emergency 
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Figure 25.1 Average cost per patient per stay trend (1980-2005). 
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Figure 25.2 Average annual expenditure per consumer for health care in a year (1990-2007). 
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Figure 25.3 Emergency room visits per 1000 population (1999-2009). 


720 


700 


680 


660 


640 


620 


600 


580 T T T T T T T T T 
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 


Figure 25.4 Inpatient days per 1000 population (1999-2009). 


room visits also suggests that patients are unwilling to visit hospitals unless absolutely necessary 
and/or life threatening. 

According to the U.S. Department of Health and Human Services, the total health expenditures 
as a percent of gross domestic product of the United States increased from 5.1% in the year 1960 
to 15.3% in the year 2005 [4,5]. This percentage is the highest among several developed and 
developing countries. The total health expenditures as a percent of per capita health expenditures 
in the United States increased from $147 in the year 1960 to $6410 in the year 2007. 

With the exponentially increasing cost of patient care and liability in medical institutions, 
there is ongoing research to consult and treat patients via remote methods. The current trends in 
health-care costs and developments in wireless sensors lead to a reasonable advancement toward 
telemedicine. 

Radio frequency identification (RFID) technology, primarily developed to replace and overcome 
the limitations of the bar code technology, is rapidly finding applications in several industries and 
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Figure 25.5 Outpatient days per 1000 population (1999-2009). 


particularly in the health-care industry. Many hospitals across the globe are adapting to this latest 
wireless technology by integrating the available biomedical sensors with commercially available 
RFID networks to create an intelligent wireless health-care network that in recent times is more 
generally known as mHealth (Mobile Health) or telemedicine. It is widely believed and aspired 
that telemedicine is the solution for growing health-care needs, cost, and resources in developed as 
well as developing countries. 


25.1.2 Fundamentals of RFID 


RFID technology is the latest in wireless communication technology that was originally intended 
for item tracking applications. The hardware modules of the RFID technology can be broadly 
classified into the interrogator (also called the reader) and the tag. The interrogator is analogous 
to the bar code reader and the tag is analogous to the bar code sticker. The primary advantage of 
RFID over traditional bar code technology is in terms of speed, memory, and beyond line-of-sight 
communication. To explain further, RFID can identify more tags in unit time compared to the bar 
code technology. There is more memory on an RFID tag compared to the bar code sticker enabling 
detailed information of the item to be stored on the tag attached to the item and eliminating the 
necessity of referring a database to store and/or retrieve additional information. The most important 
advantage of RFID over bar code is the ability to communicate beyond line of sight between the 
interrogator and the tag. 

To simplify the application of the technology, RFID technologies allow the transmission of a 
unique serial number wirelessly, using radio waves. The two fundamental parts of the system that 
are necessary to do this are the RFID tag and the RFID interrogator. The interrogator requests 
all tags in its range to identify themselves. The tags respond by transmitting their unique serial 
number. Attaching an RFID tag to a physical object allows the object to be “seen” and monitored 
by existing computer networks and office administration systems. 

RFID can be broadly classified into active RFID and passive RFID. Active RFID tags are powered 
by an onboard battery. Passive RFID tags, depending on the distance from the interrogator, 
either harvest energy from the interrogator-transmitted electromagnetic energy (near field) or 
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communicate with the interrogator by modulating and reflecting the incident electromagnetic 
energy (far field). It is important to note that most REID communication protocols are reader 
initiated and therefore the tags have to wait for interrogator initiation to transfer or request for 
data. Passive REID technology is typically used for item management applications and active REID 
has more specific applications like cargo tracking, military applications, and sensors [6-8]. 

The RFID interrogator can be broadly divided into the following blocks from a system design 
engineer’s perspective: 


Antenna 

Power circuit 

Modulator and demodulator 
Data processor 

Memory 

Host interface 


The RFID tag (irrespective of the power source) can be broadly divided into the following blocks 
from a system design engineer’s perspective: 


Antenna 

Power and switching circuit 
Modulator and demodulator 
Data processor 

Memory 

Attached sensors 


Let us try to understand the fundamental design challenges for the RFID system without diving 
too deep into the details. The major difference in designing an interrogator and a tag is that the 
interrogator is not constrained by the amount of power available as it is typically powered by an 
AC outlet. Another major difference is that the dimensions of an interrogator are typically more 
flexible. This flexibility greatly affects the degree of freedom available to the engineer in designing 
the antenna and electronic circuitry of the tag. Memory for the passive tag is typically ROM type 
for obvious power constraints, and this restricts the flexibility and speed of operation of such RFID 
systems. When understanding the design challenges of RFID systems, it is absolutely essential to 
mention the Friis transmission equation (25.1) that is overlooked by most literature available: 


P, A Y 5 
p, > G+ On 61) G Cr Or) ES (110,9) (1 — 1/1?) lar. ay | eE (25.1) 


where 
P, is the power received 
P, is the power transmitted 
G, is the gain of transmitting antenna 
G, is the gain of receiving antenna 
O and @ is the inclination angle and azimuth angle measured in spherical coordinate system 
A is the wavelength of electromagnetic energy transmitted 
R is the distance between transmitting and receiving antennas 
T; is the reflection coefficient of transmitting antenna 
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I, is the reflection coefficient of receiving antenna 

4; is the polarization vector of transmitting antenna 

a, is the polarization vector of receiving antenna 

e is the antenna efficiency 

is the absorption coefficient of transmission medium 


The Friis transmission equation relates the power transmitted by the transmitting antenna to the 
power received by the receiving antenna as a function of the antenna gain, frequency of operation, 
antenna reflection coefficient, antenna polarization, and medium. It is important to understand 
that the entire RFID system can operate as designed only if the received power (at interrogator or 
tag) is adequate to decipher. As clearly equated by the Friis equation, this received power is highly 
sensitive to the polarization of the interrogator and tag antennas. While the available literature 
portrays the varied applications of RFID, the reader has to note that all that is possible in theory 
and textbooks cannot be practiced due to current limitations in the manufacturing process of RFID 
tags that directly affect the cost and dimension of available tags. 

The phenomenon that influence the received electromagnetic energy at the receiver include 
but are not limited to multipath, diffraction, fading, Doppler shift, noise (internal and external), 
interference, and ducting [9,10]. 


25.1.3 Current Applications of RFID 
Typical applications of RFID include the following: 


Automotive security, automotive location, automotive passive entry systems 

Highway toll booths, traffic congestion detention and avoidance 

Livestock tracking, wild animal tracking, pet tracking 

Asset tracking in multiple and varied industries 

Contactless payment and shopping 

Supply chain management—one of the most widely used and primary RFID application 


Current applications of RFID in health care include [11-14] the following: 


Item tracking in hospitals and operating rooms 
Patient tracking in hospitals 

Data tracking 

Drug tracking 

Crash cart tracking 

Nurse tracking 

Remote wireless continuous arrhythmia detection 
Implant monitoring 

Smart patient rooms 


25.1.4 RFID Standards and Spectrum Utilization 


The number and use of standards within RFID and its associated industries are quite complex and 
undocumented. They involve a number of bodies and are in a continuous process of development. 
Standards have been produced to cover four key areas of RFID application and use: 
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m Air interface standards (basic communication between reader and tag) 
m Data content and encoding (numbering schemes) 

m Conformance 

m Interoperability (as an extension of conformance) 


It is important to note that there are no published RFID standards that particularly define the 
physical layer communication protocol or frequencies for biomedical or telemedicine applications. 
The available market solutions adopt a combination of RFID and data communication protocols 
or resort to developing custom proprietary protocols for application-specific solutions. The avail- 
able data communication protocols apart from the RFID standards are Zigbee, Bluetooth, IEEE 
802.1 1a/b/g/n, GSM, CDMA, and GPRS. 

While application-specific custom communication protocols for sensor networks are under- 
standable during this phase of telemedicine evolution, the development of standards by a global 
entity will become an absolute necessity for expedited growth and sustenance of telemedicine in 
near future. 

There are several U.S. and international standard bodies involved in the development and 
definition of RFID technologies including the following: 


m International Organization of Standards 

m EPCglobal Inc. 

m European Telecommunication Standards Institute (ETSI) 

m Federal Communications Commission (FCC) 

RFID communication technologies are governed by the ISO/IEC 18000 family of standards for 
their physical layer of communication. The various parts of ISO/IEC 18000 describe air interface 
communication at different frequencies in order to be able to utilize the different physical behaviors. 
The various parts of ISO/IEC 18000 are developed by ISO/IEC JTC1 SC31, “Automatic Data 
Capture Techniques.” Conformance test methods for the various parts of ISO/IEC 18000 are 
defined in the corresponding parts of ISO/IEC 18047. Performance test methods are defined in 
the corresponding parts of ISO/IEC 18046. A list of parts in the ISO/IEC 18000 family is as 
follows: 


ISO/IEC 18000-1—Generic parameters for the air interface globally accepted 
ISO/IEC 18000-2—Communication at frequencies below 135 kHz 

ISO/IEC 18000-3—Communication at 13.56 MHz frequency 

ISO/IEC 18000-4—Communication at 2.45 GHz frequency 

ISO/IEC 18000-6—Communication at frequencies between 860 and 960 MHz 
ISO/IEC 18000-7—Communication at 433.92 MHz frequency 


Within a given frequency band, the nonideal communication range will vary depending on the 
environment and the factors in Friis equation. Table 25.1 provides a reference to the different 
RFID communication standards and their typical characteristics at a glimpse. 

The power that can be transferred across variable depth through the human skin and tissue 
is not equal over a wide range of frequencies. Several human torso models have been developed 
[15-19], and the loss across several frequencies is available in literature. As a rule of thumb, 
from all the different human torso simulators, it is observed that there is a greater loss as the 
carrier frequency increases [20]. This is mainly because ultrahigh frequency (UHF) and higher 
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Table 25.1 RFID—A Classification 
Low Frequency (LF) | High Frequency (HF) | Ultrahigh Frequency (UHF) 
Typical 30-300 kHz 3-30 MHz 300 MHz-3 GHz 
frequency band 
Typical RFID 125 and 134.2 kHz 13.56 MHz 433.92 MHz, 860-960 MHz, 
communication 2.45 GHz 
frequency 
Approximate <1m <2 m 433.92 MHz—up to 100 m 
read range 
2.45 GHz—1-10 m 
860-960 MHz—0.5-5 m 
Typical data rate | <1 kbps <25 kbps 433.92 MHz—up to 30 kbps 
2.45 GHz—up to 100 kbps 
860-960 MHz—up to 30 kbps 
Typical power Passive RFID Passive RFID 433.92 MHz—active RFID 


source 


2.45 GHz—active and 
passive RFID 


860-960 MHz—passive RFID 


Important 
characteristic 
near water and 
metal 


These signals 
penetrate water 
but not metal 


These signals 
penetrate water but 
not metal 


These signals can neither 
penetrate water nor metals 


Typical 
application 


Animal ID, 
automobile 
wireless access 


Smart labels, 
contactless travel 
cards, and security 
cards 


Document tracking, 
inventory tracking, and 
military applications. Most 
widely used RFID 
frequencies 


frequencies are extremely sensitive to water and humidity. The human skin and tissue absorb the 
electromagnetic (EM) radiation at these frequencies contributing to higher losses (radiated as heat) 
in the body medium. This encourages the use of low frequency (LF) and high frequency (HF) in 
health-care industry when penetration into the body is necessary. But the disadvantage is that the 
data rate and the communication range at these frequencies are very limited. The communication 
frequency of sensor network for telemedicine is therefore very crucial. 


25.1.5 REID Communication Protocols Extended 


Today, most office buildings, educational institutions, and commercial stores are hot spots, where 
several wireless systems can be connected to the Internet [21] via a wireless interface. These systems 
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transfer data according the IEEE 802.11 communication protocol at 2.45 GHz. The IEEE 802.11 
protocol (only as an example), while providing high data rates, includes recurring packet header and 
footer information and an initial high-volume data exchange between the wireless system and the 
wireless router to authenticate and initiate further data transfer [22]. These protocol requirements 
along with the addition of multiple layers for ease of communication and added overhead make 
the IEEE 802.11 and other high long-range communication protocols such as GSM, CDMA, 
and GPRS unattractive for communication where the time available for data transfer is critical, 
i.e., in the order of microseconds. For example, communicating with automobiles on a highway 
traveling at high speeds is a time-sensitive application. Another time-sensitive application is data 
acquisition from multiple sensors in a network that needs frequent calibration. Applications where 
the transmitter and receiver cannot be active for extended intervals of time due to power limitations 
[23-25] require a simple yet secure protocol for quick data transfer that is also standardized. 

An attractive alternate is, for example, the EPCglobal™ Class-1 Generation-2 protocol [26] 
popularly known as the Gen-2 protocol. This protocol, originally developed to communicate with 
only passive RFID tags, is now evolving into a communication link with greater application potential 
[27], one of them to maintain an intermittent-connection wireless network. An intermittent- 
connection wireless network is usually a star topology, where the connection between the central 
server and the system(s) is not continuous. Further, by replacing or modifying the wireless front 
end of the system with a suitable alternate, it is possible to extend this technology to replace or 
assist wired communication links such as the Intel invented USB. To simplify, the internationally 
accepted Gen-2 protocol can be used as a communication protocol, not only between passive RFID 
tags and interrogators, but also between other short-ranged wireless sensor networks. The same 
concept is also applicable to other RFID communication protocols such as ISO 18000-2 [28] and 
ISO 18000-3 [29] and is currently being researched [27]. 


25.2 Sensors for Telemedicine 


25.2.1 Biomedical Sensors versus Environmental Sensors 


The fundamental difference between environmental sensors and biomedical sensors is because of 
the change in behavior of electric and electromagnetic signals in the atmosphere and inside and near 
the human body. The electromagnetic energy reacts with the moisture content in the human body, 
and this is the main concern when designing, implanting, and communicating with biomedical 
sensors. 

Biomedical sensor design and dispatch differs from traditional environmental sensors in terms 
of material safety, communication, replacement, maintenance, lifetime, power, size, interference, 
and noise. The list is not all inclusive but a collection of the basic engineering challenges. 

The materials used in biomedical sensors that are either implantable or wearable have to be 
human friendly and nontoxic even after long-term exposure and contact. The different fluids and 
enzymes in the human body react differently with a variety of metals, ceramics, polymers, and 
plastics. Some materials are absorbed by the body after prolonged exposure. Common allergies to 
safe materials are also of concern. 

As mentioned before, the biggest challenge in designing biomedical sensors is the commu- 
nication with the sensor. Wired communication is generally not preferred not just for aesthetic 
and convenience reasons but also for safety and health reasons. Wireless communication is greatly 
challenged by the moisture content in the human body. The absorption of electromagnetic energy 
by the human body increases as the frequency in the electromagnetic spectrum increases. Selection 
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of the frequency for communication is therefore critical. Communication with implanted sensors 
is typically using volume conduction technique. The biocompatibility of this technique is still in 
research. Sensors that are passively powered have to be designed so that they can reliably operate 
at low power levels as the electromagnetic energy available for the sensor to harvest energy from 
is limited compared to the environment sensor. The transmitted energy from the sensor has to be 
sufficiently high so that it can account for the absorption by the human body and also be within 
human tolerance limits. 

The size of the sensor is also of concern, whether it is implantable or wearable. Implantable 
sensors are limited in size for obvious reasons. Wearable sensors have to be small enough not to 
cause inconvenience to the patient even after long-term use. 

Noise among nonsynchronized biomedical sensors is of concern from an application layer 
perspective. The electromagnetic energy and its harmonics interfering with the electrophysiology 
of the human heart and brain are still under research and not absolutely established but are definitely 
of concern. 

Examples of environmental sensors (not biomedical sensors) are sensors with applications in envi- 
ronmental monitoring, agriculture, biological processes detection and monitoring, food processing, 
and pharmacological industries. 

Examples of biomedical sensors include active implantable medical devices, cardiac rhythmic 
management devices, and vital sign(s) monitors. 


25.2.2 Classification of Biomedical Sensors 


A biomedical sensor can be defined as a transducer for measuring a physiological variable. Examples 
of physiological variables include body temperature, blood flow, blood velocity, electromyo- 
graphic (EMG) signals, electroencephalogram (EEG) signals, and electrocardiogram (ECG) signals. 
Biomedical sensors are the basic building blocks of diagnostic medicine and therefore communicat- 
ing with patients and biomedical sensors at great distances via existing or existential data networks 
is inevitable for advancement of telemedicine. Among latest trends in health care, self-testing is on 
the rise. This trend is driven by the desire of the patients and physicians alike to have the ability 
to perform instantaneous diagnosis and displacement of external and lengthy diagnosis model into 
the point-of-care model. 

Biomedical sensors can be classified based on their application in vitro or in vivo measure- 
ments. Sensors used primarily in laboratories and diagnostic clinics can be classified under the 
in vitro category. In vitro sensors can be further classified into physiological sensors and patholog- 
ical sensors. Physiological sensors measure electrolytes, enzymes, and biochemical metabolites in 
blood and pathological sensors, as the name suggests, measure or detect pathogens in the blood. 
Biomedical sensors for measuring pressure, flow, and concentration of gases are used in vivo 
(Figure 25.6). 

Biomedical sensors can also be classified based on the quantity of measurement and can be 
characterized into physical, electrical (bio-potential) and bio-analytical (chemical), gaseous, and 
optical sensors. It has to be noted that in all electronic sensors, the quantity being measured is 
converted into an electrical signal (either voltage or current) by a transducer. For example, an 
oximeter converts bold SpO, data into current signal using light at a particular infrared frequency. 
According to the aforementioned classification, an oximeter can be classified as a gaseous sensor 
but not as an electrical sensor. Each sensor type will be explained briefly. 

Physical sensors are typically used to measure the change in position of an object or medium. 
They are used in measuring or quantifying changes in dimensions, pressure, force, or temperature. 
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Figure 25.6 Classification of biosensors. 


Examples of apparatus that can be classified as physical sensors include blood pressure monitor, 
electromagnetic blood flow monitor, and thermometers. 

The purpose of electrical or bio-potential sensors is to measure the ionic potential generated 
inside the human body. Examples of apparatus that can be classified as bio-potential sensors are 
ECG, EMG, and EEG. 

Bio-analytical sensors are primarily used to measure concentration and traces of enzymes and 
bacteria. These sensors can be further classified as enzyme-based sensors and microbial-based 
biosensors. Most enzymes react only with specific chemicals preset in simple form or as a complex 
compound. The action of specific enzymes can be used to construct a wide variety of biosensors. 
A typical example is a glucose sensor that uses the enzyme glucose oxidase. Microbial sensors are 
used for controlling biochemical processes in environmental, agricultural, food, and pharmaceutical 
applications. These sensors typically involve the assimilation of organic compounds by the microor- 
ganism, followed by a change in respiration activity or the production of specific electrochemically 
active metabolites such as hydrogen, carbon dioxide, or ammonia produced by the microorganism. 

Knowledge of patient’s arterial blood gases such as oxygen and carbon dioxide is important in 
medicine to sustain the patient using mechanical ventilation or chemical drugs. There are several 
chemical, optical, and temperature transducers to quantify the percentage of oxygen and carbon 
dioxide available in blood. 

Optical sensor makes use of dispersion and diffraction of certain frequencies or visible, ultraviolet, 
and infrared spectrum. The change in absorbance, reflection, scattering, polarization, or refractivity 
of light through a biological medium is converted into quantifiable data by a transducer. These 
sensors are used to measure the health or healing rate of implanted tissue among several other 
applications. 


25.2.3 Sensors for Oral Telemedicine 


As any other field of life science, dentistry has been continuously evolving. Among many impro- 
visations in the field that have enhanced the speed, accuracy, and treatment modalities, sensors 
have also carved a niche. With the miniature dimensions of the sensors, they barely require any 
camouflage and are both complacent and effective in the modern-day treatment. 
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The role of oral sensors in telemedicine has been established for over a decade. With the 
advancements in the production cycle of microelectromechanical system (MEMS) devices, the role 
of sensors in oral medicine for telemedicine is once again a topic of interest for the industry and 
academics. 

In orthodontic treatments, relapse generally occurs due to mild inaccuracies in the angulations 
of teeth, which in turn is caused by obscure application of pressures that cause angulations of teeth. 
Sensors aid in continuous monitoring of the pressures causing angulations and dictating precise 
time for altering the pressure(s) in order to cause concrete tooth movements and hence successful 
treatment. 

Sensors are used in protecting infants from sleep apnea by quantizing their breathing and 
respiration. A continuous monitoring helps the management in pediatric centers to ensure the 
safety of each infant individually with minimal effort and prevent sudden infant deaths. 

A brief literature survey elicits that sensors in the oral cavity for long-term use primarily include 
orthodontic sensors and oral airflow sensors [30,31]. Sensors in development include Ph sensors for 
cavity detection and for monitoring the osseointegration of oral implants and the levels of various 
microorganisms in post-flap surgeries and graft placement surgeries. 


25.3 Telemedicine Models 


Telemedicine network can be broadly classified into a real-time telemedicine network or a continu- 
ous monitoring telemedicine network. As the names suggest, a real-time network will be primarily 
used when the patient and the medical personnel communicate with each other in real time. The 
continuous monitoring model will be used when a patient requires constant or frequent monitoring 
of his/her vitals or any other critical information as seen fit by the medical personnel for his/her 
particular condition. A third model called the hybrid model can also be used for a telemedicine 
network. As the name suggests, this model will be a combination of the real-time model and 
the continuous monitoring model. In the hybrid model, it is possible to monitor the patients 
vitals continuously and, when required, communicate with the patient and make changes to the 
sensor implanted or attached to the patient. The different modules of the two models and further 
explanation follow in the next two sections. 


25.3.1 Real-Time Model 


As explained before, this model is adopted when the medical personnel and the patient communicate 
with each other in real time. The communication here is not just the audio or visual data of the 
patient explaining his symptoms but also the vitals or any other biological information captured 
and transmitted by the sensor to the medical personnel via the telemedicine network. The medical 
personnel can in real-time adjust the sensor(s) and/or other medical electronic equipment attached 
or implanted into the patient and get immediate feedback from the patient. 

Consider the following example for a real-time model in telemedicine to adjust an implanted 
cardiac defibrillator (ICD). An ICD is necessary for patients with a heart condition where the heart 
needs help to pace (or beat) at a consistent pace. The ICD provides the beat (electrical signal) 
when the heart fails to do so. It also slows the beating heart by providing an electric shock to the 
heart when it is pacing (beating) too fast, which is life threatening to the patient. Patients with 
implanted ICDs have to make regular visits to the cardiac clinic to adjust the settings on the ICD. 
The medical personnel adjust the ICD so that it can maintain a healthy heart rate based upon 
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Figure 25.7 Medtronic ICD. 


not only the medical tests and ECG results but also the experiences and activities of the patient 
(Figure 25.7). 

A telemedicine communication network will greatly benefit the patient and the medical person- 
nel in this case. The patient can benefit from saving the time, energy, and resources from making 
frequent visits to a cardiac clinic. The medical personnel can benefit from not having to go through 
the hustle of accommodating a patient at every visit (cost efficient). 

A typical real-time model will be explained where the data link between the sensor and the 
existing data networks is an REID system. It is possible to complete the network by using other 
communication protocols and systems such as a Wi-Fi link between the sensor and the existing 
data networks. The sensor can either be implanted inside the patient or be worn by the patient. 
The transceiver design on the sensor will greatly depend upon the sensor being inside the body or 
outside. It is well documented that the human body, because of the moisture content, will absorb 
the electromagnetic energy making it a challenge to transmit modulated electromagnetic waves 
through it. When the sensor is implanted within the patient's body, the transceiver is typically a 
separate entity to the sensor that is worn by the patient. Data communication between the sensors 
is wired or by using volume conduction techniques [32,33] (Figure 25.8). 

A real-time model will consist of the patient equipped with a sensor. The sensor is either inte- 
grated or interfaced to an RFID tag. The RFID tag communicates with an RFID interrogator. 
It is also possible for one RFID reader to communicate with several RFID tags each associated 
with a different patient. Such networks are used in a hospital setting. The data collected by the 
RFID interrogator is transmitted to a secure terminal that can be accessed only by authorized 
medical personnel. The data transmitted by the real-time model is private, highly sensitive, and 
privileged information. This model incurs the same security threats as other RFID and WAN 
networks [34,35]. There are several documented solutions to make these networks stronger, 
and these solutions can be adopted as required to the real-time telemedicine communication 
model. 

The model can include an optional communication channel between the patient and the medical 
personnel that may or may not be considered part of the real-time model. This communication 
channel can be as simple as a telephone or postal mail to communicate the session summary. 
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Figure 25.8 Real-time model for telemedicine network. 


25.3.2 Continuous Monitoring Model 


This model is adopted when it is necessary to monitor the patient's conditions continuously. The 
sensor can continuously (or at regular intervals) capture required data and store it in memory. 
The sensor will transmit the data to the REID interrogator when it is in range. In this model, the 
memory required depends on the frequency at which the sensor acquires data and the frequency of 
availability of the interrogator for the sensor to transmit the information. 

Consider the following example for a continuous model in telemedicine to monitor the vitals 
of a patient suffering with or prone to pneumonia. The vitals can include blood oxygen (SpO,%), 
body temperature, and the beats-per-minute (bpm) of the patient. These vitals can be measured 
by using a pulse oximeter. The pulse oximeter can be wearable (called finger pulse oximeter) or 
implanted inside the patient. The pulse oximeter acquires the necessary vitals and transmits the 
data for the perusal of the medical personnel. In this case, the medical personnel do not have to 
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Figure 25.9 Finger pulse oximeter. 


alter the settings of the pulse oximeter at regular intervals. A continuous monitoring model can 
be applied for arrhythmia detection. For patients suspected with arrhythmia, an external sensor 
is attached for typically 24-48 h. The maximum time for which the sensor can be attached to 
the patient is limited by the maximum memory available on the sensor. By transmitting the data 
acquired by the arrhythmia sensor via the telemedicine network, the patient can be monitored for 
longer time. A continuous monitoring mode typically transfers data in one direction (Figure 25.9). 

The continuous monitoring telemedicine communication network is beneficial to the patient 
as he can be provided quality health care from the comforts of his home. The medical personnel 
can benefit from having an extra bed at the hospital. 

Again, a typical continuous monitoring model will be explained using an RFID communication 
link. Other options are available including Wi-Fi. 

The model consists of one or more patients, each equipped with a sensor integrated with an 
RFID tag. The tag transfers the sensor data to the RFID interrogator. The interrogator transfers 
the data to a secured hospital server via existing data networks. The data can be analyzed by the 
medical personnel as necessary. Since the data communication is typically one way, an additional 
communication channel becomes necessary for communication between the patient and the med- 
ical personnel. This can be a simple telephone line or an audiovisual communication channel 
(Figure 25.10). 


25.3.3 ISO 18000-6c and ISO 18000-7 for Telemedicine Sensors 


The internationally accepted RFID standards like the EPCIM Radio-Frequency Identity Protocols 
Class-1 Generation-2 UHF RFID Protocol and the ISO/IEC 18000-7 standard are developed with 
provisions to communicate and control the sensor attached to the RFID tag. Without replicating 
the entire standard, this part of the chapter will discuss the provision in the two aforementioned 
standards to communicate with optional attached sensors to RFID tags. 

A typical Gen-2 tag goes through the singulation process to enter the open or secured state 
of operation. In this state, there is provision in the standard to communicate with the attached 
sensors using custom commands. A total of 256 custom commands are available in the Gen-2 
standard. Each custom command is 16 bits. The operation code of custom commands starts from 
1110000000000000 and end at 1110000011111111. Custom commands can be used only as 
point-to-point command by the interrogator to communicate with only one tag. The memory of 
a typical Gen-2 tag is mapped into four banks. The four different banks are Reserved, EPC, TID, 
and User with bank codes, 00, 01, 10, and 11, respectively. The size of the user memory bank and 
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Figure 25.10 Continuous model for telemedicine network. 


its organization is not limited by the protocol. The user memory bank can also be shared by the 
sensor(s). 

A typical ISO 18000-7 active tag goes through the wake-up and collection process to enter the 
awake state. In this state, there is provision in the standard to communicate with the attached 
sensors using the custom commands. The number of custom commands available is not clearly 
described in the standard that is still young. Each command (custom or mandatory) is 8 bits. 
According to the current standard, any operation code not defined by the ISO 18000-7 standard 
can be used as a custom command. Sensor data have to be stored in the UDB Application Extension 
Block format. Since the data are stored in the tag’s memory bank (organization is not stringent), it 
can be retrieved by using the available memory read commands or custom commands. The sensor 
data format is specified by the IEEE 1451 [36] and ISO/IEC 24753 [37] standards. The physical 
interface between the RFID tag and the sensor is described in IEEE 1451.7 and ISO/IEC 24753 
standards. 
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25.4 Current Research in Telemedicine 


The current research in telemedicine at the RFID Center of Excellence, University of Pittsburgh, 
includes developing a finger pulse oximeter with remote monitoring capability, implantable Doppler 
viscometer with remote monitoring capability, and a pacemaker programmer with remote control 
capability. This section will introduce the functionality of each of the aforementioned electronics 
and their scope in telemedicine. 


25.4.1 Finger Pulse Oximeter 


The design is an extension of a commercially available finger pulse oximeter. For the prototype 
development, the MD300C2 oximeter from Choice Med is selected. The commercial oximeter is 
interfaced to the CC2510 mini development board, an RF System-on-Chip (SoC) solution from 
Texas instruments. 

Finger pulse oximeters use an infrared and a red LED to transmit light through the fingertip 
of a patient. The light that exits the fingertip of the patient falls on a photo-sensor. The relative 
attenuation of the two different frequency ranges of light (infrared and red) at the photo-sensor 
corresponds to the oxygen saturation and heart rate of the patient. 

The finger pulse oximeter has an onboard microcontroller unit (MCU) that is used to both 
drive the infrared and red LEDs and measure the analog voltages produced by the photo-sensor. 
After calculating the oxygen saturation level and heart rate, these values are sent to an LCD screen 
via a serial peripheral interface (SPI) bus from the MCU. 

The SPI data from the MCU to the LCD are transmitted by the CC2510 to another CC2510 
that replicates the data as seen by the finger pulse oximeter. For proof of concept, the initial design 
transmits the SPI data from the MCU to the LCD via the Texas Instruments (TI) proprietary 
communication protocol at 2.45 GHz. The typical data rate required to transmit the necessary 
information is between 100 and 250 kbps. Using compression algorithms and sampling the data 
in regular time intervals, the required data rate can be significantly decreased. 

Figure 25.11 shows the prototype finger pulse oximeter with remote monitoring capability. The 
data as measured by the oximeter on the right are replicated by the oximeter on the left. 

The design was extended by incorporating a Wi-Fi link into the system as shown in Figure 25.12. 
The MS300C2 commercial pulse oximeter from Choice Med was interfaces to the MatchPort NR 
Embedded Ethernet Device Server from Lantronix via the CC2510 mini development board, an 
RF SoC solution from Texas instruments. In this system, the data are transmitted from the oximeter 
on the right to the oximeter on the left via the Internet. 


Figure 25.11 Wearable pulse oximeter for telemedicine with TI’s link. 
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Figure 25.13 Wearable pulse oximeter for telemedicine with Bluetooth link. 


Further development of the system includes incorporating a bluetooth link into the system as 
shown in Figure 25.13. The MS300C2 commercial pulse oximeter from Choice Med was interfaced 
to the Bluetooth Mate Gold (WRL-09358) bluetooth adaptor from SparkFun Electronics via the 
CC2510 mini development board, an RF SoC solution from Texas instruments. In this system, 
the data as seen by the oximeter in the right can be transmitted via the cellular network to the 
oximeter on the left. 

The telemedicine systems shown in Figures 25.11 through 25.13 are examples of continuous 
model. The information about the patient’s vitals is regularly monitored by the medical personnel. 
The data flow is from the patient to the medical personnel. 


25.4.2 Implantable Doppler Flowmeter 


The Doppler flowmeter is an implantable continuous Doppler device capable of wirelessly transmit- 
ting blood flow information in real time to a remote receiver. As part of the University of Pittsburgh’s 
Telemedicine initiative, this implantable Doppler flowmeter allows for remote monitoring of vitals 
(Figure 25.14). 


25.4.3 Remote-Controlled Pacemaker Programmer 


In this system, a commercial pacemaker and implantable cardiac defibrillator from Medtronic, 
the Medtronic CareLink 2090, is interfaced to the MatchPort NR Embedded Ethernet Device 
Server from Lantronix via the CC2510 mini development board, an RF SoC solution from Texas 
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Figure 25.14 Implantable Doppler flowmeter. 
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Figure 25.15 Prototype version of Medtronic CarelinK® 2090 with remote control capability. 


instruments. This system is shown in Figure 25.15. The programmer on the left is operated by 
the medical personnel. The operations are transmitted to the programmer on the right via the 
Internet. The programmer on the right replicates the operation performed by medical personnel 
on the programmer on the left. Any interactions between the programmer on the right and the 
pacemaker implanted in the patient are replicated by the programmer on the left. 

The future developments of this system will include a video link between the patient and the 
medical personnel. 
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This system is an example of a real-time model in telemedicine. The data flow both ways, i.e., 
from the medical personnel to the patient and from the patient to the medical personnel. 

To summarize, tele-medicine has the innate ability to significantly improve the quality of 
healthcare [38-43]. Continuous monitoring networks of today, is only the beginning of a network 
that is yet unimaginable in terms of potential, infrastructure and research. 
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26.1 Introduction 


Various means exist today to monitor, ensure the safety of, and control access to public and private 
areas, both large and small. Such means include video monitoring, infrared (IR) moving object 
detectors, and “electric eye” tripwire approaches with IR signals across key pathways. The concepts 
presented are alternative means to monitor a security zone in ways that are difficult to counter and 
yet are highly automated and relatively inexpensive. The concepts, in combination, also appear 
scalable and, thus, can have commercial applications for small or very large security businesses in 
addition to some special applications such as oil pipelines. 

In the last 4 years, we have published several papers that investigate and evolve design concepts 
of highly automatic, autonomous intrusion detection and tracking sensor networks. Our first 
approach was to develop a very inexpensive option of tiny, expendable sensors (called “pebbles”) 
that emit extremely low-power microwave tones to cue their neighbors, exhibit swarming behavior, 
and are scalable, perhaps to hundreds of square miles. 

This initial work investigated the viability of using simple tones for signaling and combining 
tones from multiple nodes to track detected objects. We will briefly discuss the architecture and 
implementation of a low-cost prototype using MICAz motes with acoustic sensors as surrogates. 
The prototype was used to demonstrate and validate the swarming behavior of a field of “pebbles” 
in a physical environment and to explore the effect of various sensing parameters and network 
configurations. 

We have also developed the design concept and requirements for a higher cost, more sophisti- 
cated, second concept featuring multifunction array lidars (MFALs) with novel electronic scanning 
laser array apertures allowing near-simultaneous, interleaved functions including detection, track- 
ing, and cueing. The network of array lidars provides collaborative automatic detection, tracking, 
acquisition cueing, and intruder-type identification, as well as free space optics (FSO) communica- 
tion. A lidar consists of four optical phased arrays, each with about 1 million radiating elements in 
a 1 cm? aperture that enable electronic beam steering analogous to microwave array antennas. The 
four arrays are on four sides at the top of a square cross-section mast and provide full 360° azimuth 
coverage. A small number of such MFAL nodes can perform target detection over a significant area, 
but this approach is not considered as easily scalable as the first concept of swarming pebbles. 

Finally, we merge the two approaches to optimize the high performance of the lidar approach 
and the cost effectiveness of the swarming pebble network. The concept is to use a very low-cost 
field of pebbles for wide area coverage that cues the more precise inner zone MFAL network. The 
pebbles will provide coverage in the wide open, exterior spaces, inherently scalable to cover up to 
hundreds of square km, while the MFALs will be focused on the inner layer and choke points of 
a few hundred square meters or a few square km in area. We develop requirements and provide 
the calculations on the feasibility of this combined approach, including automated cueing and 
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hand-off-between pebbles and MFAL zones. We also briefly discuss the idea of mobile patrolling 
MFAL mini-vehicles that follow or converge on the intruders. 


26.2 Tone-Based Swarming Detection Network 
26.2.1 Concept 


Emergence and swarm intelligence have been widely discussed topics over the last several years. An 
attractive concept is to engineer a collective intelligence demonstrated vividly in natural swarming, 
such as the construction of termite mounds where swarms are the prime movers and no organizing 
principles are evident. For such swarms to be effective, however, certain characteristics appear 
necessary: diversity of perspective, independence, decentralization, and aggregation. In short, there 
must be exchange of information with multiple viewpoints, and the solution must be an aggregation 
of these views. 
The swarming sensor network that we have conceived involves several features: 


m Swarming—The ability of the network to focus in an ad hoc manner based on a collective 
response to inputs to each individual component. 

m Distributed intelligence—A collection of elements appearing to respond intelligently, 
although no single element possesses the information or orchestrates a response. 

a Inferential signaling without protocol—The nontraditional conveyance of information that 
may consist of inferences from simple signals or behaviors, i.e., “body language” from near 
neighbors. In our case, we will use a microwave tone as the signal. 

m Chaotic behavior—The system seemingly behaving in a manner that borders on 
unpredictable or unstable. 

m Emergence—Effects and behaviors appearing at higher levels that are not explicitly evident 
in lower-level component designs. 

m Multiple systems—including a sensor system, signaling system, and remote sensor field 
monitor. 


This system concept was conceived to approach the simplest, most economic, and lowest risk means 
of establishing a sensor network. 

Figure 26.1 illustrates the essential elements of the network. A large number of nodes, or “peb- 
bles,” is randomly distributed, for example, air dropped or individually placed, with the condition 
that there is sufficient density so each pebble is within overlapping sensor and communication 
range of its near neighbors. The pebble field surrounds an installation, pipeline, or building to be 
protected, for example. Although disguised as pebbles, even if they are discovered, they are far too 
numerous to gather. 

Each sensor node contains (a) a small, specific sensor, for example, acoustic, radio frequency 
(RF), chemical, optical, or biological; (b) a power supply; (c) a microwave communications 
transceiver (two-way communications); (d) a controller chip; (e) a suitable container to disguise 
and/or otherwise protect the sensor node components; and (f) an optional solar array. One or more 
remote directive transceivers monitor the spectral power density levels and transmission locations 
of this “pebble field” and, as a design option, can transmit over a “control” channel to modify some 
number of nodes’ programming, for example, detection thresholds. 

The pebble nodes’ sensors are passive, i.e., nonemitting for minimum power consumption. All 
the pebbles could contain the same type of sensor, but, more generally, they could contain a mix 
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Figure 26.1 Elements of the network. 


of different sensor types, perhaps even two or more sensor types per pebble node. The sensors can 
be preset to search for specific characteristics, such as to detect certain vibration frequencies of the 
human voice or visual motion indicative of humans, or they may be set to detect any sound above 
average background or any moving object, etc. The detection threshold can also be set. The set 
threshold would have a reasonably insensitive “cold detection” value to ensure a low false alarm 
rate. However, the sensor could be cued to become more sensitive if it receives a cueing signal from 
its node neighbors, indicating that one or more neighbors have detected an intruder. The requisite 
settings will be discussed later. 

Assume one pebble senses an event according to its preset cold detection threshold. It will emit 
a weak communication signal in all directions but with intentionally limited range to save energy 
and to only reach its neighbors, which would likely also have a reasonable probability of detection 
(Pp) if the event is real and not a false alarm. 

As each sensor makes a subsequent associated detection, it continues to emit a simple signal, 
such as a tone, that can be received by its neighbors. If a correlation of sensor events to an intruder 
incident begins to transpire, the activity of the sensors will naturally increase the total power density 
in that area at the cue tone frequency via their communications transmissions. They will also be 
inherently collaborating and, by their near-neighbor interactions, producing a swarming behavior. 
As long as a sensor node continues to detect, it will send a signal to support continuation of the 
swarming activity. If detections begin to wane, the swarming signals will diminish. 

From a distance, a directional receiver tuned to the frequency band(s) or tone(s) of the pebble 
nodes scans the sensor field to monitor activity. The directional antenna may, for example, 
determine pebble signal activity above normal at a particular direction. The strength of the received 
signal would be proportional to the strength of pebble detection activity, implying a firm indication 
of detecting an intruder. The angle of reception, for example, from a direction-finding antenna, 
indicates the approximate location of the intruder. Multiple remote directional receivers could be 
operated for triangulation to further localize the swarm activity. A similar localization approach 
would be to have pebbles at different locations radiating at different tonal frequencies. The remote 
receiver would be able to link the sensor net activity to a command and control (C2) node for 
action, if indicated. For example, a swarm activity indicating a high confidence of detection could 
result in a decision to intercept the detected intruder. The remote receiver could be manned and 
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Figure 26.2 Event chart. 


the C2 decision made at that location, or it could be unmanned and operated remotely via a 
communications link to a security office. Additional options with this type of operation include 
cueing of video cameras and tripping an audible or silent alarm if activity of the nodes exceeds a 
threshold, enabling a response by security forces. 

Figure 26.2 is an event chart to further illustrate the intruder detection by a field of signaling 
sensor pebbles. Shown are several pebbles, P1 through P4, each with a sensor, such as a microphone 
or IR detector, as well as a microwave signal cue tone communications transceiver. As an intruder 
passes within the sensor range of pebble P1, illustrated by the “X” on the left, P1 detects above a 
preset threshold (D) and transmits a cue tone (T). The tone from P1 is received (R) by pebble P2, 
which, in turn, cues its detection threshold to a lower, more sensitive setting (C). Sometime later, 
as the intruder passes into the detection range of pebble P2, the pebble detects (D) with even less 
signal than for P1 (second “X” from the left). It then sends out a tone that is received by pebbles 
P1, P3, and P4. Pebble P3, which is now set at the cued detection threshold, detects (D) and 
transmits a tone (T) that is received by Pebbles P2 and P4, and so on. A remote receiver may be set 
to detect the tones of, say, five transmitting pebbles. Further, scanning with high antenna gain can 
allow tracking of the intruder’s progress as new pebbles detect and transmit tones, in turn. 


26.2.2 Analysis 


Our analysis of the swarming network concept is based on an existing design. We selected the 
“Mica mote” for consideration because the design exists and could be modified to accommodate 
this concept [1,2]. Although, perhaps, larger, more costly, and higher power than may ultimately 
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Table 26.1 Characteristics for Pebble Nodes and Remote Receiver 


Pebble Node Remote Receiver 
Noise figure 2 dB 2 dB 
Bandwidth (tone) 50 kHz 50 kHz 
Carrier frequency range 2.4 GHz 2.4 GHz 
Received signal to noise 12 dB 12 dB 
required (with noncoherent 
integration) 
Receive antenna loss 3 dB 3 dB 
Antenna gain —8 dBi 
Transmit power —5 dBm (-30 dBm excursion) 
Receive aperture 0.025 m? (0.25 m? excursion) 


be desired for pebbles with this concept, it represents a design that may be adaptable to inferential 
swarming behavior. We begin by discussing communications connectivity calculations for the mote 
(i.e., pebble) and then describe blockage detection, as a variant of this concept, in Section 26.2.3. 

The mote design description of [1,2] indicates up to a 30 m communication range at a moderately 
high data rate (hundreds of kilobits per second) using the Bluetooth protocol at 2.4 GHz. Based 
on these characteristics and on power consumption information and, further, recognizing that we 
are merely communicating narrowband tones, we postulate the design characteristics for the mote- 
based pebbles and a remote receiver in Table 26.1. We consider nearly omnidirectional sensor and 
communication antennas rather than more complex sector sensors and antennas. We also postulate 
expected propagation loss values between pebbles and the remote receiver, assuming potential 
foliage effects, of up to 15 dB when well within the radio horizon and up to 35 dB at the radio 
horizon [3]. The radio horizon range depends on receiver heights, for example, 7.1 km between a 
surface pebble and a 3 m high remote receiver antenna for standard propagation conditions. 

Table 26.1 includes two levels of pebble node transmit power, —5 dBm of the present mote 
design and an excursion to a much lower power of —30 dBm representing a potential advanced, 
very low-power design. Also, two remote receiver antenna apertures are considered: a significant 
gain, directional antenna of a 0.5-m-by-0.5-m area and a smaller, lower gain antenna with a 16 cm 
side dimension. 

Figure 26.3 provides example results from our parametric calculations of the minimum number 
of “pebbles” detectable by a remote receiver for the combinations of pebble transmit powers and 
receiver apertures versus range. We assume the pebbles are placed randomly but well within each 
other’s reception range to induce swarming response. Our calculations confirm maximum commu- 
nications range between pebbles of about 30 m for —5 dB power. If we assume that each transmitting 
pebble has a transmit power of Ppep and a transmit gain of Gpeg, and there are N transmitting 
pebbles, then for noncoherent combining the collection of transmitting pebbles has an average 
effective radiated power of NPpeyGpep. The signal-to-noise ratio (SNR) at a distant receiver is then 


S _ NP, eb GrebA roy tB 
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Figure 26.3 Number of transmitting pebbles versus range for various pebble transmitting 
powers and receiver apertures. 


where 
Arev is the receive antenna aperture 
R is the range to the receive antenna 
kTysB is the noise power where & is Boltzmann’s constant, Ts is the system noise temperature 
and B is the receiver bandwidth 
VB is the S/N improvement factor due to noncoherent integration over a time £ 
LpL, is the propagation loss and other system losses, respectively 


In Figure 26.3, we assume the parameters of Table 26.1 and a receiver noncoherent integra- 
tion time of 1 s. For ranges well within the radio horizon, we assume a propagation loss of 
15 dB to address fading and foliage effects. The propagation loss will increase significantly at the 
radio horizon and beyond. From Figure 26.3, for a pebble transmit power of —30 dBm and a 
distance of 5 km, a remote receiver with a 3 m antenna height would detect 12 pebbles or more 
with a 0.25 m? aperture. This implies that out of perhaps thousands of pebbles, at least 12 would 
need to radiate, indicating intruder activity, before the remote receiver would detect any response. 
Note that our assumption of noncoherent power combining requires more than a few pebbles to 
be radiating. Experimental verification of the propagation loss and incoherent tone integration will 
be discussed later. Whereas the minimum number of pebbles might ensure a minimum of false 
alarms, it may also be insufficient to ensure adequate intruder detection sensitivity, for example, if 
the pebbles are sufficiently separated and sparse so a human intruder would only trigger a smaller 
number of pebbles at any time. Thus, swarming network configuration analysis was performed 
to determine the requisite pebble density and remote receiver dynamic detection range for the 
intruder detection sensor sensitivity, as will be described in a later section. From our parametric 
calculations, we conclude that, for a pebble transmit power of —5 dBm, we can detect a few pebbles 
with a remote receiver with a reasonable antenna aperture under significant propagation loss from 
1 to 20 km in range. For a much lower pebble transmit power (—30 dBm) (e.g., to reduce cost and 
detectability by an adversary intruder), a remote receiver with a significant antenna aperture could 
detect the beginning with a few dozen pebbles out to several kilometers. 
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Observe that this system is potentially scalable from a few such pebbles to literally millions 
over many square kilometers with no design or settings changes to the network other than perhaps 
to connect a larger network of multiple remote monitor stations. Also observe that the concept 
features three interactive systems. First, are the sensors themselves: one or more hosted in the 
pebble chassis. These individual sensors could be quite sophisticated in terms of miniaturization 
and filtering mechanisms, for example, to ensure requisite detection and false alarm probabilities, 
Pp and Pg, etc., respectively, and to ensure distinguishing of the intruder of interest from false 
identifications such as wild animals or tumbling debris. The sensors also must provide adequate 
detection range to ensure overlapping coverage with neighboring pebbles’ sensors. The second 
system in the concept is the signaling network. It must have adequate range and detection/false 
alarm performance and maintain continual connectivity among near neighbors. The third system 
is the remote monitor for sensing swarming behavior, with requisite swarming and false swarming 
statistics that can also transmit commands to reset or change the state of the pebble field. 

As we have mentioned, sensor detection of an intruder could be accomplished by a number of 
means, including passive audio detection or passive IR detection [4]. We begin by stipulating that 
a detection system is designed to provide 12 dB or greater SNR out to a range of 10 m from the 
intruder location. Signal detection theory can then be used to determine probabilities for detection 
and false alarms. For 12 dB or greater sensor SNR, the “cold” probability of detection is greater than 
0.93 for a Pra of 1071. If the sensor has been cued, we assume it lowers the detection threshold to 
increase the Pp. For example, for a 12 dB SNR, the detection threshold can be lowered to achieve 
a Pp of 0.995 for a Pry of 107?. If the detection is increased to 20 m, the same sensor system 
provides a 6 dB SNR, assuming that free-space spreading propagation conditions are in place. In 
this case, the cold Pp is 0.1 for a Pr4 of approximately 107%, and the cued Pp is 0.5 for a Pry of 
1072. These passive sensor detection results are summarized in Table 26.2. 

Figure 26.4 illustrates the range relationships among adjacent pebbles using standard Pp versus 
Pra curves, for example [5]. 

In the network system that signals to cue neighboring pebbles upon declaring a cold detection 
by a pebble, the cue is accomplished by emitting a RF tone that is detected by a neighboring 
pebble. To help prevent a “swarming instability” in which all pebbles are eventually cued and 
yield an excessive false alarm rate, cue tones last only a few seconds (we have initially selected 5 s) 
before turning off and will not be reinitiated unless another detection is made. For a RF tone-based 
cueing system designed to provide 12 dB or greater SNR out to a range of 10 m, signal detection 
theory can then be used to determine probabilities for detection and false alarms (according to 
classical Pp versus Ppa curves). For 12 dB or greater SNR, the Pp is greater than 0.93 for a Pry of 
107%. If we assume that the tone signal variation with range corresponds to free-space spreading, 
the design would then provide a 6 dB SNR at a range of 20 m, which corresponds to a Pp 
of ~0.1 for a Pra of 1074. For free-space propagation conditions, nearest neighbors will have a 


Table 26.2 Zone Range Requirements for Pebbles in a Passive 
Detection Sensor System 


Passive Detection at Passive Detection at 
10 m (12 dB SNR) 20 m (6 dB SNR) 


Cold detection | Pp > 0.9, Pra =10-4 | Pp < 0.1; Pra = 10-4 


Cued detection | Pp > 0.99; Pra = 10-2 | Pp < 0.5; Pra = 107? 
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Figure 26.4 Pebble sensor range performance. 


Table 26.3 Zone Range Requirements for Near-Neighbor Signal 
Cueing System 


10 m (12 dB SNR) 20 m (6 dB SNR) 


Communication Pp > 0.9, Pra =107% | Pp < 0.1; Ppa = 1074 
tone detection 


high Pp, whereas the next closest neighbors will have a significant reduction in detection probability. 
More severe propagation conditions would further reduce the detection probability beyond the 
nearest neighbors. This analysis specifies the influence of neighbors. Table 26.3 summarizes these 
requirements. Figure26.5 illustrates the near-neighbor signaling range and associated received 
signal in a free-space environment. 

Experiments were performed, which provide the basis for the propagation conditions assumed 
earlier, demonstrated free-space spreading for elevated modules and more severe attenuation with 
range for modules located on the ground. The results are illustrated in Figure 26.6. 

A straightforward analysis showed, and associated experimental results confirmed, that a remote 
receiver receiving the incoherent sum of five or more pebbles represents in its directive beam an 
acceptable likelihood of intruder presence, as will be discussed. 

The zone requirements for the remote monitor could be designed independently from specifying 
its range from the pebble field. They are 


m Pp for at least five pebbles >0.9 with associated Pry less then 107% within the spatial 
dimensions of the directive antenna beam 

m Localization of transmitting pebbles (assuming detecting “swarms” of at least five near- 
neighbors) to within 100 m circular error probable of centroid 


For given ranges between a monitor and the pebble field boundaries, these requirements accommo- 
date derivation of required monitor receiver sensitivity, antenna gain, and antenna beam splitting 
accuracy (e.g., via con-scan or monopulse techniques [6]). 
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Figure 26.6 Mote-to-receiver connectivity. 


26.2.3 Intruder Blockage Detection Analysis 


A novel sensor was explored analytically, which is a rudimentary, yet difficult to counter, intruder 
detection mechanism: “intruder blockage detection.” For this concept, each pebble is set not only 
to receive a communication tone, but also a blockage-sensing tone at a different frequency. We 
intermix into the pebbles several “illuminator” pebbles that continually transmit a low-power, 
blockage-sensing tone. All other pebbles are set to receive the communication tone as well as 
the blockage-sensing tone. The concept is that pebbles will receive rather constant blockage-sensing 
tone signal power levels unless an intruder passes through the path between the transmitting pebble 
and receiving pebble. 

During the passage of an intruder, which blocks the blockage-sensing transmission to a receiving 
pebble, the receiving pebble will detect a significant drop in received signal power for a short period. 
If that occurs, a potential detection is declared, and the pebble emits the 2.4 GHz communication 
tone. Upon receiving the communication tone, the neighboring pebbles increase the sensitivity of 


A New Generation of Intrusion Detection Networks m 599 


their blockage signal triggering threshold so they would detect the loss of signal more readily, i.e., 
they are cued to “listen” more carefully. 

Note that countering blockage detection in the microwave band between several illuminator peb- 
bles and many randomly placed receiving pebbles would likely be difficult. Further mitigation could 
be in the form of pebble receiver detection of attempts to “jam” the transmission frequency or pro- 
vision for randomized tone hopping or modulation that would be difficult for an intruder to mimic. 

We performed some preliminary diffraction calculations to determine whether there is adequate 
blockage signal loss from a human intruder for detection at representative pebble distances. We 
modeled a human intruder as an infinitely long, vertical cylinder that is 0.3 m in diameter. The 
cylinder’s complex permittivity is that of saltwater to approximate the permittivity of the human 
body. For such a simple shape, vertical signal polarization causes a deeper shadow than horizontal 
polarization by 2-3 dB. However, because this model ignores irregularities in human shape and 
composition as well as irregularities in the surface and due to nearby obstacles, which would 
tend to weaken the polarization effect on the blockage, we consider as a worst-case horizontal 
polarization. Figure 26.7 plots blockage loss versus distance from the obstacle for 3, 10, and 
20 GHz blockage-sensing tones. 

The signal drop at 2.5 m distance is about 3, 6, and 8 dB for 3, 10, and 20 GHz, respectively. 
Thus, it appears that using 20 GHz as the blockage-sensing tone provides more effective blockage 
detection, i.e., a sufficient change in signal against a typical environment for reliable detection by 
a receiving pebble without excessive false alarms. 

Figure 26.8 illustrates the idealized blockage “shadow” in two dimensions for 20 GHz. For this 
calculation, we used a parabolic equation computation method described in [3]. The blockage signal 
loss appears to be significant at 5-6 dB, even 10 m behind the blocking cylinder, and some loss at 
34 dB even occurs at an assumed maximum inter-pebble communications distance of 30 m. We 
therefore conclude that a blockage detection capability may be effective against a human intruder 
near 20 GHz. One can set a signal power threshold in the remote receiver requiring some minimum 
number of pebbles to transmit a communications tone to conclude that there may be an intruder, 
thus further reducing the prospects for a false alarm. The stability of the pebble node network must 
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Figure 26.7 Power reduction versus distance behind a 0.3 m diameter blocking cylinder for 
horizontal polarization. 
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Figure 26.8 Two-dimensional plot of power reduction near a 0.3 m diameter blocking cylinder 
for horizontal polarization. 


be maintained so cueing for greater pebble detection sensitivity does not cause the network to go 
unstable, in which sensitized pebbles continue to detect false alarms after the triggering blockage 
event has ceased. Greater network stability may be achieved with a timeout feature in which the 
transmissions of the detecting pebbles cease after, for example, 5 s and the detection threshold is 
reset to the “cold detection” value. The timeout approach would also conserve node power. 

A preliminary detection and false alarm analysis was performed for the intruder blockage 
detection approach. A noncentral chi-square distribution was used to model the received signal 
plus noise power. For received signals 30 dB above thermal noise power, a single pebble cold 
detection threshold set to detect a drop in signal level of 4-6 dB (below the 30 dB level) will yield 
a very high Pp and very low probability of false alarm. Additional pebble detections correlated 
with the first detection would not appreciably improve the detection performance, but would 
indicate intruder movement through the pebble field. Figure 26.9 plots probability versus SNR. 
The ascending line indicates the probability that the 20 dB signal plus noise exceeds the SNR. The 
descending line indicates the probability that the signal reduced by 6 dB is less than the SNR. For 
received signals 20 dB above the noise, a 6 dB cold detection threshold would be set to provide a 
Pp of about 90% and a false alarm rate of 1074. If the pebble then cues neighboring pebbles to 
reduce their threshold to detect a 4 dB drop in signal level, the reduced threshold would be more 
likely to trigger cued detections, and these detections will serve to improve detection performance 
(and indicate intruder movement). If the cold threshold were retained by the neighboring pebbles, 
rather than the cued threshold, further cold detections would not occur as readily and, as a result, 
would not provide intruder movement indication. 

The cueing mechanism for reducing the detection threshold for 20 dB signal to noise may not 
provide better detection performance than other strategies, such as reporting any events beyond a 
very low threshold (like 2 dB) and taking M out of N as a basis for declaring a detection. However, 
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Figure 26.9 Probability versus SNR for 20 dB signal case. 


given that the pebbles need to minimize transmission time and power for energy conservation, the 
cueing approach may prove optimal. 

Note that multiple illumination frequencies may be needed to reduce interference associated 
with a pebble receiving the combined signal of multiple blockage-sensing illuminators. For example, 
if a pebble is receiving signals at comparable strengths and at the same frequency from two or more 
illuminators, intruder blockage from one of the illuminators could be masked, or jammed, by the 
signals of the unblocked illuminators. If neighboring illuminators operate on different frequencies 
and each of the detection pebbles is tuned to only one of the illumination frequencies, or, alter- 
nately, could be tuned to discriminate different illuminations, this interference problem could be 
alleviated. 

For the desired detection performance, we mentioned previously that an illuminator pebble 
must provide 20 dB signal to noise at 20 GHz to a receiving pebble at approximately 10 m range. 
We considered the feasibility of a continuously transmitting illuminator from a power consumption 
viewpoint. We estimate that a —15 dBm transmit power is sufficient (assuming omnidirectional 
20 GHz antennas, a 100 kHz receive bandwidth, 3 dB receive noise figure, 3 dB losses on transmit 
and receive, and a 15 dB propagation loss). For an overall efficiency of less than 5%, we estimate 
that the total power consumption could be on the order of 1 mW. 

The mote design description in [1] indicates a 3 W-h battery, which would indicate up to 
3000 h of continuous operation of an illuminator. A 1 cm? solar panel that can generate 10 mW 
of power in full sunlight [1] would extend operation. A pulsed system could also be considered 
to minimize power consumption. Such a system would increase complexity, requiring clock 
synchronization between the illuminating and receiving pebbles. Finally, because the pebbles 
are considered expendable, periodic replacement of blockage-sensing illuminators with depleted 
batteries would likely be economical. 


26.2.3.1 Network Sensitivity Analysis Results 


The basic assumption of the swarming pebbles concept is that inter-pebble and pebble field- 
to-remote receiver propagation losses are approximately constant and predictable. Then pebble 
transmit power can be set to only allow near-neighbor pebbles to receive cue tones. It may 
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turn out that under certain conditions, such as multipath, obstacle blockages, and propagation 
ducting, actual propagation loss is highly variable over seconds to minutes by tens of decibels. 
The following paragraphs describe a bounding analysis that was performed of what would occur 
in a swarming pebble field for extreme propagation conditions. Then, additional network design 
features are proposed for consideration in the prototypes to accommodate propagation variations 
while retaining network stability and performance. 

Consider a pebble field with 20 m separation over about a 4 km? area. This could be represented 
by 10,000 pebbles covering approximately a 2 x 2 km? pebble field or a long rectangle, say 
0.25 x 16 km, along a pipeline or on the periphery of a utility complex such as a power plant. 
Assume also that the pebbles reset to their cold, uncued detection thresholds every 10 s so that we do 
not need to consider cumulative probabilities (and power is conserved). Propagation loss variations 
can greatly alter network performance. For example, if a total swing of +18 dB of propagation 
would occur, a 20 m nominal communication range between pebbles would increase to 106 m or 
reduce to 2.5 m. Case 1 is an idealization of the former case, and case 2 considers an extreme of 
the latter case. Consider the following limiting cases: 


m Case 1: Perfect propagation—In this case, for example, strong ducting, if one pebble makes 
a cold detection and emits a cueing tone, all other pebbles receive the cue and set their more 
sensitive cued detection thresholds. From aforementioned text, a cold detection threshold 
is assumed with Pp X 0.9 and Pry E 1074. A cold detection cues all other 9999 pebbles to 
the cued threshold with Pp 0.99 and Pra = 1072. 

m Case 2: No Propagation—In this case, for example, extreme blockages, if a pebble makes 
a cold detection and emits a cueing tone, no other pebbles receive the cue. Therefore, all 
10,000 pebbles remain at the cold detection threshold of Pp ~ 0.9 and Pra = 1074, 

m Case 3: Intermediate Threshold—In this case, assume all pebbles retain a single cold thresh- 
old of Pp = 0.95 and Pra E 1073, with no cueing. This represents a potential alternative 
state if either case 1 or 2 is determined. 


For each case in Table 26.4, the longer term average number of false alarm detections per 10,000 
pebbles is shown in the second column. Assuming a searching directional antenna for the remote 
receiver that covers 0.1 of the pebble field area (0.4 km?) at any time, column 3 indicates the 
average number of false alarms per antenna beam position. If the remote receiver is set to detect, at 


Table 26.4 Performance Analysis Results 


No. of False Alarms Probabilities of Five Pebbles 
No. of False in Remote Receiver That Can Be Detected per 
Alarms per Antenna of Beam Beam (Cold plus Cued) 
Case 10,000 Pebbles | Covers 1/10 Pebble Area | Pp PEA 
1 100 10 0.9 10712 
2 1 ~0 ~0.6 10720 
3 10 ~1 ~0.8 1015 
Nom. Op. perf. 1 ~0 ~0.9 10712 
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a minimum, 5—10 pebbles, then 5—10 false alarms among pebbles in a beam would cause a remote 
receiver reception to falsely indicate an intruder. Columns 4 and 5 illustrate the Pp for five pebbles 
within a beam and Py with a cold detection plus four cued detections (for case 1) and five cold 
detections in cases 2 and 3. Table 26.4 also shows nominal operation performance described in this 
chapter in addition to the three cases. 

Case 1 indicates that a cold intruder detection made by a pebble that cues all other pebbles 
would yield a Pry of 10712 and Pp of 0.9. However, if the pebble making the cold detection sends 
a cue signal that is received by all pebbles in the field, the resulting per-pebble Pg4 of 107? implies 
that during the course of the 10 s interval about 10 false alarms per beam would light up all beam 
positions and prevent localization of the intruder. 

Conversely, case 2 with essentially no propagation would result in only cold detections. An 
intruder would undergo a series of five cold detections with a cumulative Pp of 0.6 and Pry of 
10720. This is not a very high Pp. 

If, under conditions of uncertain propagation, all pebbles are ordered, via the remote monitor, 
to set cold thresholds of a Pe of 1073 and a Pp of 0.95, and no cueing was allowed, an intruder 
would be detected with five cold detections with cumulative Pp of 0.8 and Pry of 10-15. 

These cases suggest several possible options in pebble network design. Common to all three 
cases is the need to, in some way, measure propagation conditions as they likely vary over seconds, 
minutes, or hours. Measurements could take the form of either direct measurement of propagation 
loss or signal strength or monitoring the false alarm density as indication of propagation effects. To 
first order, local effects such as specular multipath from buildings or blockage from shrubs or ridges 
are thought not to have an overall negative sensitivity impact on performance. Blind spots (poor 
propagation) or enhanced sensitivity zones (enhanced signals) may influence when a detection is 
made or whether an intruder track is maintained consistently, but overall network performance is 
likely essentially maintained. 

In the interest of only minimal design feature additions to maintain low cost, the following 
design options are discussed. 


26.2.3.2 Test Tones and Gain and Sensitivity Control 


Periodically, all pebbles could be commanded by the remote monitors to send brief tones at a test 
frequency (other than the frequency of the cue tone). Depending on received signal strengths, the 
pebbles would change the receiver sensitivity or the gain of the transmitter or receiver amplifier. 
The remote receiver monitors themselves could also adjust receiver sensitivity during these pebble 
calibration transmissions to ensure a detection threshold for the prescribed number of pebble tones 
per beam (e.g., 5-10) based on the test tones. The cost of adding the circuit is expected to be 
negligible. The most likely drawback is power consumption. However, the test tone could occur 
in much less than 1 s. 


26.2.3.3 False Alarm Monitoring 


In a significant size field of, for example, 10,000 pebbles, false alarm statistics per monitor beam 
position may be gathered by the remote receiver. If a remote monitor never detects random false 
alarms per beam position (perhaps via a test receiver channel with higher sensitivity than the normal 
monitor channel), it is likely that either nominal performance or case 2 performance is in effect. 
However, if the receiver is consistently detecting two or more apparently random false alarms in 
more than one beam position, this would be indicative of low propagation loss (case 1). One option 
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under this condition would be for the remote receiver to command the pebbles in that area to a 
nominal case 3 condition. Although detection performance is somewhat lower than the nominal 
case, it might be adequate. 

A no-cost alternative could be to have a person walk through a pebble field on occasion or 
trigger a few dispersed test pebbles to emit tones to test the response of the field. 

The advantage of the false alarm monitoring approach is that no additional design feature would 
be required for the pebbles except the ability to receive a command to change to a case 3 threshold 
setting. 


26.2.4 Swarming Simulation 


A simulation was developed to gain further first-order insights into the swarming behavior. The 
simulation that was created to perform some initial investigations into the system performance is 
not unlike Mitch Resnick’s “Star Logo” simulation [7] that mimics the behavior of slime mold, 
except that the sensors in our simulation are not allowed to move. Basically, a large number of 
pebble sensors is deployed, each with a very simple rule set governing its behavior, such that the 
collective behavior could indicate “intrusions” that could be remotely monitored. 

The simulation models the case of 20,164 pebble sensors distributed in an evenly spaced grid 
over 1 km? (for a grid spacing of 7 m). A mouse interface allows insertion of a moving intruder 
through the grid. At each time step, the state of each sensor is updated to reflect intruder detections, 
communication among nearby sensors, false alarms, etc. The result is effectively a cellular automaton 
with the mouse-controlled intruder as an additional external stimulus. 

Each sensor’s cued “alert state” is indicated by gray levels. On the simulation display, each white 
pixel indicates a “cold” sensor that is not detecting or receiving, and each black pixel indicates a 
sensor that has detected an intruder and is emitting a tone to be received by nearby sensors that in 
turn respond by temporarily decreasing their detection threshold to the more sensitive cued setting, 
indicated by the gray pixels. 

At each time step, the process of updating the state of each cell in the automaton—that is, each 
sensor in the grid—consists of a single Bernoulli trial. The probability of “success” (i.e., detection 
or false alarm) for a sensor is determined by two factors: the current detection threshold state of 
a sensor (“cold” or “cued”) and a unit step function (detection or false alarm) of range to the 
intruding target, if one exists. This detection range is fixed at 10 m. Given the resulting probability 
of success, a uniform pseudorandom number in the unit interval determines whether the sensor 
has a detection or false alarm. In the event of a detection or false alarm, the detection threshold 
state of all sensors within communication range (also fixed at 10 m) is lowered to the “cued” state. 

Some performance parameters of the sensor are adjustable by slider controls. The default “cold” 
probabilities of detection and false alarm are 0.9 and 1074, respectively. In the “cued” state, with 
a lowered detection threshold, the default probabilities of detection and false alarm are 0.99 and 
107?, respectively. The detection and communication ranges of each sensor are fixed at 10 m, 
the effect being that the neighborhood of influence of each sensor is fixed to a subset of eight 
surrounding sensors. 

Figure 26.10 shows by black and gray dots, pebble detections (black) and those receiving a cue 
(gray). The dashboard on the right allows the user to adjust the key zone design parameters of cold 
and cued pebble communication tone detection and false alarm probabilities over one and several 
orders of magnitude, respectively. Also shown is a sliding adjustment of cue tone persistence time 
from 0 to 5 s. 
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Figure 26.10 Simulation false alarm snapshot at zone requirements to first order. 


The simulation was useful in testing the stability of the network. Figure 26.10 illustrates a 
typical picture of the display with random false alarm detections for a pebble packing density of 
7 m separation. The lack of correlation (less than five pebbles detecting in a local area) indicates 
that a remote receiver with a directive antenna beam that scans over the pebble field would not 
receive the requisite signal level to expect an intruder. 

Figure 26.11 illustrates an intruder moving through the sensor field and the sensors having 
detected the intruder, leaving a persistent trail of transmitting pebbles for a directive remote 
receiver to detect. If the remote receiver display retains the pebble transmission history and its 
beam directivity is sufficiently focused, then the track of an intruder could be followed. 

Figures 26.12 and 26.13 illustrate instabilities of the sensor field. Figure 26.12 shows the false 
alarms if the cold detection false alarm probability is increased by a factor of 10 from 107% to 1073. 
Enough individual false alarms occur that a directive remote receiver could be receiving a spatially 
correlated signal for a false intruder alert, as shown. Figure 26.13 is the case for which the cold 
detection false alarm probability is retained at the nominal 10~4, but the false alarm probability for 
a cued detection is increased by a factor of 10 from 107? to 107!. Swarm “clouds” appear within 
seconds due to a high percentage of cued false detections. Moving these false alarm probabilities, 
shown in Figures 26.12 and 26.13, back to their nominal values causes the sensor field to calm 
down to the picture of Figure 26.9 in a matter of seconds. 

Other insights gained from the sensor net model include the following: 


m The density, or average distance of near-neighbor pebbles, should not be too dense or cued 
“swarming” instability such as Figure 26.13 can commence from an intruder detection. Too 
little density will provide insufficient internode interaction to respond to an intruder with 
sufficient cued detection. 


606 m Intelligent Sensor Networks 


Cold P „= 0.9000 
— 


Cold P „= 0.0001 
| 


Cued P = 0.9900 
| 


Intruder Cued P, = 0.01 
| 


Cued timeout = 5.0 s 


=m 


Track persistence 
from 5s 
pebble transmission 


Figure 26.11 Simulation of false alarms and intruder track for nominal zone requirements. 
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Figure 26.12 Simulation false alarm snapshot at 10 times higher rate than zone requirement 
for cold detection. 
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Figure 26.13 Simulation false alarm snapshot at 10 times higher rate than zone requirements 
for cued detection. 


m Reducing the tone timeout from 5 to 1 or 2 s cleans up the false alarm picture (Figure 26.10) 
but further reduces the persistence of the track picture at the remote receiver. 


Whereas this simple system features stationary pebbles exhibiting simple behavior, additional 
features can be conceived that would exhibit further emergent behavior. 


m Pebble motion toward the detected intruder to further maintain sensor contact, i.e., kine- 
matic swarming could be incorporated. This would require some damping mechanism, such 
as slow speed, so that one intruder does not empty the field in some locations allowing 
another intruder to enter undetected due to insufficient pebble density. 

m The feature of identification via algorithms or a mix of sensor types for correlating could 
narrow the types of intruders of interest, for example, humans or motor vehicles versus wild 
animals. A zone condition for probabilities of correct and false identifications would need 
to be added. 

m Transmission (spraying) of tagants at the intruder by detecting sensor pebbles, such as small 
RF tags, IR emitters, or phosphorescent dust, would allow for subsequent human or other 
sensor tracking beyond the pebble field. 


26.2.5 Architecture 


As described earlier, the objective intrusion detection system consists of “pebbles,” small, simple 
sensor nodes dispersed throughout the environment being monitored, and a base station, or 
remote receiver, located at a command post within transmission range of the pebble field. While 
the approach ultimately relies on the eventual development of small, extremely discreet, inexpensive 
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devices, our prototype implementation uses widely available mote hardware to investigate swarming 
sensor network behavior in the real world. We utilize functionality of the motes that is envisioned 
to be feasible in these small devices, which could be produced in quantity. 


26.2.5.1 Preliminary Hardware Design 


Critical to the affordability and scalability of the intruder sensor network concept is pebble tech- 
nology that supports small size, low power consumption, and low cost. To support the swarming 
network concept, each pebble would contain a sensor, a microwave transceiver for two-way tone- 
based communications, an antenna, a power supply, a controller, and a suitable container to 
disguise and protect the sensor node components. A variety of sensor techniques could be used 
for intruder detection. The microwave blockage signal detection concept identified earlier would 
require the addition of a microwave receiver and antenna, both at 20 GHz or higher. 

Over the past few years, a number of different motes have been designed and built. Network 
experiments described in this chapter use a MICAz Mote [8]. The MICAz Mote has a form factor 
of 5.7 x 3.18 x 0.64 cm and uses two size AA batteries. Although larger and more expensive than 
required for this application, each Mica mote contains a radio and microprocessor/controller that 
are significantly more complex than needed for our application. Recently, a team of researchers 
at the University of California Berkeley developed the 2.5 x 2.0 mm Spec Mote [9]. The Spec 
Mote is a fully working single-chip mote, which has a RISC core and 3 kB of memory, uses a 
902.4 MHz radio, and has been shown to communicate about 12 m with a data rate of 19.2 kbps. 
The Spec Mote shows the potential for integrated circuit (IC) technology to facilitate reduced size 
and volume production costs. 

Although similar in technology, existing motes do not perform the functions needed by the 
proposed concept, particularly the need for both a two-way tone-based signaling channel and 
potentially also a blockage detection receiver. Conversely, existing motes are more functionally 
complex than needed for the swarming network concept. To further establish size, performance, 
and cost feasibility of the intruder detection pebble, a design concept is being developed, as 
described in the following paragraphs. 

There are two major functional aspects of the proposed pebble: (1) a sensor mechanism and 
(2) a tone-based RF signaling mechanism. Key to realizing the low size and cost is to realize the 
RF signaling mechanism on an RF IC. The RF signaling mechanism has two primary modes of 
operation. In the receive mode, the pebble listens to detect, cue, or alert tones from other nearby 
pebbles. When an alert tone is received, the pebble lowers its sensor detection threshold. When the 
pebble’s sensor makes a detection, the pebble is switched to a transmit mode in which it transmits 
an alert tone for a fixed amount of time. The RF signaling mechanism requires the following 
functions: a detector for power sensing of the RF alert tone, a tone generator, a timer, registers 
and comparators and very simple control logic, a small antenna for RF transmission and reception, 
amplifiers including a low-noise amplifier on receive and possibly a power amplifier on transmit, 
and a power source. 

The receive detector, transmit tone generator, control circuits, and any amplifiers would be 
included on a single RF IC. The frequency is assumed to be 2.4 GHz. RSSI circuitry is used for the 
receive detector and a voltage-controlled oscillator (VCO) is used for the tone generator circuitry. 
An off-chip crystal provides suitable frequency stability and drift for the tone signals. Stability is an 
important parameter because it drives the noise bandwidth and integration time that can be used 
at the remote receiver. Using a VCO alone to drive the antenna provides -10 dBm of power to the 
antenna. Higher-power levels can be achieved by adding a power amplifier, but at the cost of higher 
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Figure 26.14 RFIC block diagram. 


power consumption during transmit. Using function circuit blocks and the Jazz Semiconductor 
Multi-Project SiGe process, a preliminary RFIC design is shown in Figure 26.14. Chip cost is 
estimated to be less than $2 in volume production. 

Several options are available for the antenna, including wire antennas and surface-mount 
antennas. Significant recent development has addressed the miniaturization of board-level omnidi- 
rectional antennas at 2.4 GHz for wireless applications. One example is the NanoAnt technology 
from Laird Technologies [10]. This technology operates in the 2.4 GHz Bluetooth band and is 
designed for high-volume surface-mount attachment through use of pick-and-place processing. 
This antenna measures 2.5 x 2.0 x 2.0 mm and reportedly costs $1.10 in high volume. This and 
other antennas will be evaluated as part of our concept development effort. 

A preliminary board layout containing a chip antenna, the RFIC, and a temperature-controlled 
crystal oscillator was developed and provided a size of 14 x 13 mm. The board size may have to 
be increased pending study of the ground planes necessary for chip-based antennas. The pebble 
concept would include a 3 V cell battery mounted on the bottom side of the RF signaling board. 
Three-volt coin batteries are available in a variety of capacities and cost under $1 in quantity. 

A simple rugged container is envisioned to be very inexpensive. The simplest sensor, a micro- 
phone manufactured by Horn Industrial Company, is available for $0.16 in batches of 1000 or 
more from Digi-Key Corporation [11]. The upper bound volume cost per unit for a pebble with a 
simple microphone is therefore less than $5. To add a 20 GHz circuit, as the illumination blockage 
detection would require, a receive-only RFIC circuit with a somewhat smaller size chip could cost 
about $1.50. A separate antenna would be required. The bounded volume per unit cost of the 
illumination sensor pebble would therefore be expected to be less than $7.50. 


26.2.5.2 Prototype Software for Audio Microphone Sensors in MICAz Motes 


The software running on the pebbles in the intrusion detection network prototype using MICAz 
motes is described later. The pebbles can be in one of three different timed modes: (1) time 
synchronization, (2) swarming, and (3) data transmit. 
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The time synchronization mode ensures that timers kept by all of the motes in the network 
are synchronized in order to get consistent real-time data. One mote in the network is specially 
programmed to act as the master mote and the rest act as slave motes. In time synchronization 
mode, the master mote waits a specified period of time, transmits a packet, and immediately 
enters swarming mode. The slave motes wait to receive this packet, upon which they immediately 
enter swarming mode as well. The time synchronization mode is the initialization mode for the 
network, prompting all motes to enter the swarming mode at the same time. While not essential to 
the swarming behavior and the longer-term pebbles implementation, time synchronization in the 
prototype allows us to accurately correlate event and power data over time among all nodes. 

The swarming mode produces the “instinctive” actions of each node that contribute to the 
ultimate swarming behavior of the network. This mode, which can be modeled as a state machine, 
implements the actual intrusion detection using the acoustic sensors on the motes and simple 
tone-based communication. There are four different states in this mode as shown in Figure 26.15. 
The mote's LEDs provide a visual indication of its state. 

State 1 is the normal detection mode, in which the pebble is not emitting any cue tones and 
its microphone detection threshold is set to the normal, baseline level. Once the pebble detects a 
sound event, indicated in Figure 26.15 by the “Sensor event” transition out of State 1, the pebble 
enters State 3, in which it emits a 2.48 GHz unmodulated cue tone. The pebble returns to State 1 
after t; seconds (the Cue Tone Timer period), indicated by the “Cue tone timer fired” transition 
between State 3 and State 1 in the figure. 

If a pebble in State 1 detects a radio event—i.e., its RSSI readings are above the specified 
threshold, indicating that a neighboring pebble is emitting the cue tone—it moves into State 2 
and lowers its microphone detection threshold. The pebble initiates a low threshold timer upon 
detecting a radio event. If the pebble still has not detected a sound when the low threshold timer 
fires (ty seconds after the radio event), the pebble returns to State 1. If on the other hand, the 
pebble detects a sound exceeding the lower threshold, the sensor event causes a transition from 
State 2 to State 4, in which the pebble begins transmitting the cue tone as well. 

In State 4, the pebble will continue sound detection at the lower threshold level while emitting 
the cue tone. Similar to the behavior in State 3, the pebble initiates the cue tone timer when it 
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Figure 26.15 Pebble software state diagram. 
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begins emitting the tone. Should the cue tone timer fire before the low threshold timer, the pebble 
will return to State 2. If, instead, the low threshold timer fires, the pebble will move into State 3 
and reset the detection threshold to the normal value. 

The swarming mode can be set to run for any amount of time, depending on the length of the 
experiment. After the swarming mode completes, the mote enters the data transmit mode, during 
which each mote sends its event and time data as packets to the base station. The motes send 
their data one after another using a timer offset based on their ID (identity) number to prohibit 
transmission overlap. The MICAz mote at the base station receives the data packets and forwards 
them to the base station computer over the serial port. The motes’ LEDs indicate which mote is 
sending packets during the data transmit mode. 


26.2.6 Initial Experiments 


To assess the impact of propagation loss in a tone-based network, a series of tests were conducted to 
collect RF data in real-world settings. The experiments used the MICAz platform [8] from Crossbow 
Technology, Inc., to emulate the pebbles. These programmable motes (short for remotes) have 
a simple, low-power microcontroller and communicate via a Chipcon CC2420 radio chip. The 
CC2420 transceiver has features that enable a realistic simulation of the swarming pebbles network. 
The transceiver can be programmed to transmit an unmodulated sine wave at approximately 
2.4 GHz and has the receive signal strength indicator (RSSI) circuitry to report approximate 
received signal levels. The specifications of the MICAz radio are shown in Table 26.5. 

For our experiments, we configured the MICAz motes to transmit a 2.48 GHz tone and measured 
the effective radiated power (ERP) and the transmit frequency of several motes to characterize 
performance. We measured the ERP of 13 motes in an anechoic chamber. The ERP of 12 of the 
motes varied from —2.6 to 1.7 dBm; one mote was found to be significantly lower in ERP. We 
measured the frequency across 6 motes and found the tone frequencies to be tightly grouped within a 
30 kHz span. The excellent repeatability across motes of the tone frequencies is conducive to using a 
small noise bandwidth to maximize detection range at a remote receiver. The measured performance 
of these motes compares well to the pebble performance we assumed in our connectivity analyses 
in Section 26.2.2 [12], where we assumed a receiver noise bandwidth of 50 kHz. 

The first experiment conducted was to measure the mote-to-mote connectivity, which relates 
to the required mote spacing for the swarming network concept. In this experiment, a MICAz 
Mote was configured to transmit a 2.48 GHz tone and the connectivity to another MICAz Mote 
was measured using the mote’s RSSI. The RSSI measurement is not a precise measurement; the 
mote’s radio specification quotes an RSSI accuracy of +6 dB. The experiments were conducted on 
a flat grass field that contained nearby structures and fencing. A picture of the field is shown in 
Figure 26.16. 


Table 26.5 MICAz Radio Specifications 


Parameter Specification | 


Frequency band 2400-2483.5 MHz 


Transmit data rate | 250 kbps 


Transmit power —24 — 0 dBm | 


Receive sensitivity | —90 dBm (minimum), —94 dBm (typical) | 
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Figure 26.16 Grass field where experiments were conducted. 


One set of measurements was taken with both transmit and receive motes placed on the 
grass, and another set of measurements was taken with transmit and receive motes placed on 
0.61 m-high tripods. The preliminary results show that elevated motes without the presence of 
blockage structures can detect alert tones over relatively large separations consistent with free-space 
propagation and little fading. The results for motes placed on the grass are much more limiting. 
These results show a significant attenuation, roughly on the order of 3 dB/m. Results to date suggest 
that embedding pebbles in a dielectric layer (e.g., Styrofoam) to elevate them off the ground could 
provide tailorable propagation range, depending on the layer thickness. 

An important aspect of the swarming network concept is the ability to achieve sufficient 
connectivity to the remote receiver that is monitoring the sensor field. Factors that affect the 
connectivity include the following: 


a Physical limitations on pebble transmit power due to the need for low power consumption 
and electrically small antennas 
Propagation effects such as multipath, fading, and blockage 
The expectation that the tone signals emitted by the individual pebbles are not adding 
coherently at the remote receiver due to variations in their transmit signal phase and electrical 
path lengths to the receiver 

m The variation of the aggregate emitted signal due to the dynamic nature of the network 


The connectivity experiment used a MICAz mote configured to transmit a tone at 2.4 GHz 
and measured the signal received by a horn antenna and spectrum analyzer at various separation 
distances. Again, the measurements were taken on a grass field as shown in Figure 26.6 for three 
sets of data. 

The first two sets were taken on different dates and used a MICAz mote on the grass. For the first 
set of data, the horn was approximately 0.6 m high; for the second set of data, the receive horn was 
1.35 m high. For the third set of data, the transmitting mote was placed on a 7.6 cm high plastic 
cup. The second set of data for the mote on the ground indicates reduced performance. Because 
these two sets of data were taken on separate dates with some differences in equipment, there could 
be various reasons for the difference (e.g., variations in mote transmit power and variations in the 
measurement setup). The original connectivity calculations allocated a 15 dB propagation loss to 
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Figure 26.17 Noncoherent combining experiment. 


account for the effects of fading, foliage, etc. The theoretical curve is consistent with the mote 
elevated 7.6 cm and is optimistic relative to a mote resting on the ground. 

An experiment was conducted to test the assumption that tone signals transmitted from a 
collection of pebbles will add noncoherently at a remote receiver even for just a few radiating 
pebbles. For this experiment, conducted in an anechoic chamber, the receive signal from one, two, 
four, and then six transmitting motes was measured (Figure 26.17). The tone signals from the 
motes themselves are noncoherent. Two measurements were taken with four transmitting motes; 
the mote locations were varied for these two measurements. The motes were located in a line with 
separations of several centimeters and at a distance of 10 m from the receive horn. Figure 26.17 
indicates that the motes exhibited noncoherent combining as expected. 


26.2.7 Prototype Evaluation 


Once we completed the implementation and simulation, we defined experiments to answer the 
following questions: 


Does the pebble field exhibit swarming behavior in response to stimulus? 

Is the remote receiver able to detect signals from the swarming field? 

How does the configuration of the pebble field affect the network? 

How reliable is the intrusion detection (e.g., what is the probability of detecting an event 
and how often are false events detected)? 


PSA 


We conducted experiments in a mostly empty rectangular room with a concrete floor on which 
we directly placed the motes. Characterization tests included raising the motes from the floor by 
placing them on cups; however, we found that in our space, the increased transmission range 
caused higher false-positive detections. We tested pebble fields containing 8 and 25 motes arranged 
in several configurations. The remote receiver station was situated approximately 20 ft from the 
pebble field such that all pebbles fell within the line of sight of the receiver’s directional antenna. 
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26.2.7.1 Configurations 


We conducted initial experiments using eight pebbles in two different geometrical configurations 
that are shown in Figure 26.18a and b. In configuration (a), the pebbles are in a grid layout with 
the antenna placed 25 ft away from the pebble field’s edge. In configuration (b), the pebbles are 
in a line with the antenna placed 20 ft away from the pebble at one end of the line. The pebbles 
are labeled with the numbers 0-7, which we use to identify them later in Section 26.2.7.2 for 
each configuration. We simulate an intruder with a constant sound tone (we refer to this as the 
“stimulus tone”) played while moving in the indicated direction. Using a constant stimulus tone 
and well-defined path allows us to perform controlled experiments. Future work will vary the 
simulated intruder’s sound level and mobility pattern. 

All mote configurations used the same parameter settings. The cue tone must be low enough that 
only immediately nearly pebbles can detect it to reduce false alarms. The level must be high enough, 
however, that the receiver antenna can detect cue tones. We performed preliminary characterization 
tests in the indoor testing environment to measure power levels and distances required for detection. 
These tests indicted the appropriate tone power level, pebble spacing, sensitivity thresholds, and the 
distance to the receiver antenna for our specific setup. Table 26.6 list the mote parameter settings 
that we used in all of our experimental configurations. 

Note that the threshold levels are relative to the mote’s running average of raw RSSI and 
microphone readings and are not physical values that trigger events. 


26.2.7.2 Results 


The first configuration we examine is the eight-pebble grid shown in Figure 26.18a. Figures 26.19 
and 26.20 show the results from one individual run of the experiment using the grid configuration. 
The goal of our experiments using the eight-pebble field was to validate the swarming network 
concept in a real implementation and physical environment. To do this, we first examine the events 
occurring at each pebble during the experiment. 
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Figure 26.18 Experiment configurations. (a) Grid configuration. (b) Line configuration. 
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Table 26.6 Mote Parameters Used in 


Experiments 
Cue tone timer value 8s 
Low threshold timer value 20 s 
Cue tone power level 25 dBm 


Mote cue tone detection threshold 5 


Microphone threshold (normal) 20 


Microphone threshold (lowered) 10 
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Figure 26.19 Event chart for grid configuration. 


Figure 26.19 shows an event chart that displays the event and time data collected by each pebble 
during a representative experiment run. The mote IDs on the event chart refer to the numbering 
of the pebbles in Figure 26.18a. In this chart, we can see a successful swarming network that is 
responding to stimulus. For example, when we initiated the stimulus tone, Mote 2 detected a 
sensor event and transmitted the cue tone, which caused its neighbors, motes 1, 4, and 5 to lower 
their thresholds. As the stimulus tone source moved in the direction indicated in Figure 26.18a, 
Mote 5 and eventually Mote 7 also detected sensor events and began emitting cue tones, causing 
their neighbor motes” thresholds to be lowered. 

Figure 26.20 shows the power level received over time at the antenna during the same experiment 
run, as collected from the spectrum analyzer by the automated LabVIEW program. The peaks in 
the power level, which can be correlated to the event-time data from the motes, show that the 
remote receiver can successfully detect the tones emitted by the motes. Note that the second power 
level rise, caused by Mote 2 emitting a tone, is an example of a false detection. This shows the 
importance of detection thresholds both at the pebbles and the base station. 
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Figure 26.20 Power levels at receiver station for grid configuration. 
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Figure 26.21 


Event chart for line configuration. 


Our next series of experiments used the eight-pebble line configuration shown in Figure 26.18b. 
Figures 26.21 and 26.22 show a representative set of results from an experimental run using the 


line configuration. 


The stimulus tone was started near Mote 7 and moved along the line of motes toward Mote 0. 
The swarming network behavior can once again be observed—Mote 7 emits the first cue tone, 
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Figure 26.22 Power levels at receiver station for line configuration. 


lowering the threshold of its neighbor Mote 6, which then detects the stimulus tone and begins to 
emit a cue tone, with the others following as the stimulus tone source moves through the field. 

The cight-pebble experiments successfully demonstrated swarming behavior; however, the 
reliability of the intrusion detection is difficult to fully evaluate. 

Inconsistencies in the sensitivity of the acoustic sensors of the motes make it difficult to accurately 
control the sound detection, especially with as few as eight motes. Some microphones are more 
sensitive than others, so a mote that is farther away might detect a sound that a nearer mote may 
not have detected. However, with a larger number of motes and a larger pebble field, this problem 
may not be severe enough to be noticeable. 

These issues, while MICAz mote-specific and unrelated to the swarming concept itself, give 
light to the various factors that must be addressed when implementing a swarming sensor network 
in a real environment. Even as technology advances, the sensor nodes will always be limited by size, 
power, and cost factors that will affect their reliability and consistency. This provides support to 
the idea that a large number of motes are necessary for the swarming concept to successfully detect 
intruders. This also reinforces the notion that tuning of parameters in the sensor devices is not a 
trivial aspect of using this type of behavior for intrusion detection. 


26.3 Multifunction Array Lidar Network for Intruder 
Detection and Tracking 


26.3.1 Baseline Concept 


With the limitations noted in the swarming pebble discussion [12,13], it was recognized that even 
with automated wide area intruder detection that swarming pebbles provide, potentially intense 
operator monitoring and inspection would most likely be required to verify the identity of the 
intruder, for example, hostile intent versus a wild animal. This led us to consider an alternate 
approach that utilizes a new device—a thin, phased-array laser aperture used for a MFAL that is 
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highly analogous to microwave phased array antennas used in multifunction array radar (MEAR). 
The concept offers fully automatic detection-intruder type determination and precision tracking 
without requiring operator monitoring or intervention. The design, performance, and fabrication 
characteristics of this new optical phased array concept were summarized in a recent paper [14]. 

Compared to visible light imaging, passive IR imaging provides the possibility of operation 
both day and night by exploiting the thermal radiance of objects of interest. Using IR cameras 
on the ground, tracking the three-dimensional (3D) position of moving objects is accomplished 
by correlating images of two or more cameras and computing the distance to each object using 
stereoscopic range estimation algorithms. When multiple objects must be tracked, correlation of 
images becomes difficult because each camera may have a different view of the objects unless cameras 
are located very close to each other. Also, stereoscopic range estimation accuracy varies with the 
range to the object. The lidar system described here provides the 3D position and identification of 
the basic shape (for identification) of each object with one sensor, without correlation or estimation 
algorithms, and with high range accuracy independent of range. It also allows multiple lidars to 
collaborate to mitigate obstacle blockages and countermeasures such as lidar blinding. 

Note that this is a concept study to explore the potential of an MFAL network. The authors 
emphasize that the flat, thin optical phased array, although they believe to be feasible, has yet to 
be built. 

The concept of the MFAL was enabled by the recent conception of an extremely thin, true 
optical phased array. This optical phased array has allowed, for the first time, a direct extension of 
the capabilities embodied in modern MFARs, such as the U.S. Navy’s Aegis system [15], and in 
sensor netting systems, such as the U.S. Navy’s Cooperative Engagement Capability (CEC) [16]. 

Figure 26.23 illustrates a concept for a network of MFAL nodes. Each node provides part of 
a surveillance fence for azimuthal surveillance coverage, and some elevation coverage, using four 
arrays with 1 cm? aperture areas on the vertical faces of a cube. Three MFAL nodes are shown 
monitoring the area around a building complex. A fourth node is shown on the rooftop of the 
smaller building. This node serves as the remote monitor as well as a backup surveillance node. 


Figure 26.23 Intruder detection swarming network concept. 
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Figure 26.24 Illustration of MFAL node details. 


As a remote monitor, it receives data from the other nodes via ESO links established by their optical 
arrays. Each array can be used for either lidar surveillance or FSO communication at any given 
time. Shown in an inset of the figure is a command monitor. The monitor could be automatic with 
an alarm to alert an operator ifan intruder is detected and/or identified. 

Figure 26.24 illustrates the configuration of an MFAL node consisting of a pedestal base, 
extendable mast, and the optical array “cube.” The extendability of the mast ensures obstacle 
clearance and adequate range to the optical horizon. As shown in the figure, the pedestal base 
incorporates solar arrays, power supply, processing and control electronics, a Global Positioning 
System (GPS) receiver, and the laser transmitters and receivers. The cube consists of four optical 
phased-array apertures and a GPS antenna. The insets illustrate the structure of an optical array 
including functional sections and magnified features at the micron scale. 

Each face of the cube consists of an optical layer stacked on an IC layer. The optical layer is 
composed of an array of crossed optical waveguides (dark gray and labeled “Array Aperture” in 
the figure), as well as a cascade of multimode interference (MMI) splitters (light gray and labeled 
“Optical Feed”) [14]. Each array is fed along its lower edge by edge-coupled optical fibers leading 
to the base that carry optical power from the transmitter and to the receiver. 

Scattering occurs at the intersections of the array. The waveguides are fabricated in an electro- 
optic (EO) material, and electrodes on each waveguide control the phase of the light at each 
intersection and thus the beam steering. For an N x N array, there are 2N steering electrodes. 
These electrodes are located on the back side of the optical layer where they can be controlled 
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by the IC layer, which translates steering directions from the beam controller in the pedestal base 
into voltages for the steering electrodes. A ground plane is established by placing a single, large, 
L-shaped electrode on top of the optical layer over all the steering electrodes. 

The array's rows and columns are spaced pseudorandomly to minimize cross-coupling while 
providing adequate array element density and sidelobe control [14]. The initial design was based 
on using polymethyl methacrylate as the EO material. This material yields an average waveguide 
period of 9 um. For N = 1000, this spacing yields a total array size of 9 x 9 mm. The steering 
electrodes and MMI splitters occupy about 5 mm in each direction on two sides of the array, 
leading to a face size of about 15 x 15 mm. 

Notional electronic processing and control components in the base include ICs on multi-chip 
modules for the lidar functions and the communications functions, memory modules, a GPS 
receiver for time synchronization, and location fixes used for the lidar alignment. Figure 26.25 
shows the modes of the MFAL. 

Figure 26.26 illustrates the basic operation of the MFAL network in the case of an approach- 
ing intruder. Node N1 searches in a pseudorandom search pattern with a 0.01° width beam. 
The description following the figure uses MFAR and sensor network terminology found, for 
example, in [6]. 

When an intruder has been detected, nominally at 0.4 km in an unobstructed view, node N1 
detects motion A against a 3D background “clutter map” and transitions to track dwells B that 
includes a contiguous pattern of beams to scan the cross-sectional extent of the object and identify 
its nature (e.g., human, animal, vehicle), and, in so doing, determine whether it appears to be in 
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Figure 26.25 Modes of the network nodes. 
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Figure 26.26 Network operation with approaching intruder. 


the shape of a real object or a false lidar return. The track state centered on the intruding object 
is transmitted via FSO C to the central monitor node for which node N1 forms a beam in the 
monitor’s direction, and the monitor receives during a periodic communication reception timeslot. 
The monitor receives the node identification, node location, and intruder coordinates from the 
most recent lidar measurement, along with a track state vector for correlation with the returns 
of other nodes. The tracking node also sends the same information to its nearest neighbors (e.g., 
N2) D at beam locations provided by the monitor node. At a predetermined, periodic reception 
timeslot, the neighboring node N2 receives the data from the tracking node N1. The monitor and 
neighboring nodes (e.g., node N2) receive periodic track updates from the tracking node, N1, once 
per second. This cueing information, together with a priori knowledge of the node positions and 
their respective sensor orientations, is then used for the track handover process. The neighboring 
node N2 monitors the track, and when the intruder is determined to be sufficiently close for possible 
detection, an autonomous attempt is made to acquire the target. Should the node N2 successfully 
acquire the target with sufficient track quality, it assumes the track to free up the original tracking 
node (informed via the communication link). Alternatively, both nodes can retain the track and 
report separately to the monitor node for maximum track certainty. 

Multiple beams are required to cover the rather large ambiguity as a result of GPS-based lidar 
alignments relative to the highly accurate range and angle accuracies of the lidars themselves. If 
there is no return at the direction of the received remote track, the neighboring node continues 
to send out periodic acquisition beam patterns at the latest track location until it either receives 
a return or times out. Upon receiving its own lidar return, the neighboring node then sends a 
series of beams to develop its own track scan E and reports the track updates to the monitor and 
neighboring nodes F. 
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Tracking multiple intruders by each node is possible because the track mode need only provide 
dwells for limited target directions, thus minimally impacting the lidar timeline. For example, 
limiting the system to one detection acquisition or two acquisition cues will only decrease the 
search rate by approximately 10%. Although this condition slightly increases a particular volume 
search coverage period, it is tolerable on average because every search frame will not contain an 
interleaved acquisition. 

One advantage of a network of MFAL nodes versus a single node is that if an intruder passes 
near obstacles where one node’s lidar is obstructed, another node at a different vantage point may 
be able to see around the obstacle so that the combination of tracks received and combined at the 
remote monitor may be complete even if no single node can maintain a continuous track. Selection 
of node locations can therefore be made on the basis of uncovering obstructions. Another advantage 
is ifan intruder attempts to jam a lidar node with an optical emitter, other lidar nodes would likely 
not be blinded at the same time due to their diverse locations; however, if they were, the strobes of 
the jamming could be reported to the remote monitor for triangulation. They would also indicate 
such anomalous behavior occurring in the surveillance area. 


26.3.2 Preliminary Design and Analysis 
26.3.2.1 Preliminary Network and MFAL Design Characteristics 


This section provides design features and parameter values, as well as example calculations that 
support the design concept. Table 26.7 presents MFAL network detection and track parameter 
values and nominal response times. These represent system requirements for an operational mission 
to detect, track, and identify the types of intruders with nominally one sensor node per km*. We 
believe that the processing and data speeds support hundreds of track updates and detections per 
second per node. 

Table 26.8 presents the specific design parameters of the MFAL. These parameter values and 
the associated range calculation are based on commercially available laser and receiver components. 

At these powers and pulse widths, care must be taken to avoid damaging the optical components 
because of the small mode-field area (MFA) of the waveguides. The fibers connecting the lasers to 
the array faces have a MFA of about 80 um?, yielding a fluence of about 2 J/cm”, well below the 
fiber's damage threshold of about 50 J/cm?. The array itself is a polymeric EO material, with the 
input power divided equally among all of the waveguides. However, as initially conceived in [14], 
the cascading splitter network that divides this power was also polymeric. Coupling directly into a 
single polymer waveguide on each surface would result in a fluence of about 40 J/cm”, far in excess 
of the polymer’s approximate 1 J/cm? damage threshold. To solve this problem, the optical layer 


Table 26.7 Characteristics of MFAL Functions 


Detection range 400 m at 10% reflectivity for human walker 
FSO communication range 1km 
Volume search coverage 360° azimuth —0.1° to +0.4' elevation 


Volume search coverage period | <1 s for 2% of volume, 50 s all beam positions 


Transition to track <5 s 


Track update rate 1 Hz 
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Table 26.8 Design Parameter Values 


Energy per pulse 1.6 uJ 
Pulse width 4ns 
Pulse repetition interval 25 us 


Pulse integration 


Four pulses noncoherent 


m 623 


Target dwell time 100 us 
Range resolution 0.6 m 
Wavelength 1550 nm 
F-number 3 
Beamwidth 0.01? 
Pass band of elastic channel 0.1 nm 
Quantum efficiency 0.2 
Receiver electronic bandwidth 20 MHz 


Preamplifier current noise density | 2.12 e-12 A Hz1/2 


Amplifier noise factor 1 
Non-multiplied dark current 0.2e nA 
Multiplied dark current 0.2e nA 
Detector noise factor 20 
Detector current gain 100 


Target area (person) 11148 cm?(0.7 x 1.9 m) 


Target reflectance 0.1 


Array loss 6 dB 


will be composed of multiple materials, with silica waveguides used for the first two stages of the 
cascading MMI splitter network (to go from 1 to 100 waveguides per side of the array) and polymer 
used for the last stage of the cascade (to go from 10 to 1000 waveguides) and the array itself. 

The patterned silica waveguides will be butt-coupled directly to the polymer waveguides 
lithographically. All waveguides will be fabricated on the same substrate. 

The communications mode requires orders of magnitude less power than the lidar modes, 
assuming both transmit and receive directivity. Therefore, the lidar modes dominate the power 
requirement of 30 W. The solar cells covering the 0.64 m? base, as shown in Figure 26.24, provide 
90 W of power during peak lighting conditions, thrice that required to operate the system, allowing 
the batteries to be charged, during operation, at twice the discharge rate. Lithium polymer batteries 
with capacity sufficient for 3 days of sunless operation, 2500 W-h, easily fit within the base. 
Alternately, the MFAL could operate on line power if greater reliability were required or if a 
higher-power laser, enabling longer range, was employed. 
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As will be discussed later, each array face has its own lidar transmitter and receiver, totaling four 
per node; so, for lidar modes, each array operates in parallel and independently. We also assume a 
single 250 mW fiber communication transmitter/receiver that can be switched into the arrays, one 
array at a time, according to a specific time window reserved for FSO communications. 

Because lidar and communications signals are in the infrared region of the spectrum, at 1550 nm 
wavelength, ranges are affected by significant dust and fog. Lidar range will likely be impaired before 
communications range. For example, at a fog visibility of 100 m visibility, the preceding MFAL 
range would be reduced from nearly 400 to 56 m at 1550 nm wavelength. One all-weather 
compensation approach would be to integrate and intermingle the all-weather, but less-precise, 
swarming microwave network, described previously with the FSO MFAL network. Ifweather-based 
FSO impairment (i.e., loss of FSO reception) is observed at the monitor node operating both the 
MFAL and the swarming microwave networks, the monitor could command the MFAL network 
to shut down and the swarming network to start up as a backup capability, thereby conserving 
MFAL power. 


26.3.2.2 MFAL Surveillance, Acquisition, Tracking, and Identification 


In describing an example array search and track strategy, we apply the parameter values of 
Tables 26.7 and 26.8; specifically, an aperture beamwidth of 0.019, a 4 ns pulse length yielding 
approximately 0.6 m range resolution, and a pulse repetition frequency and 4-pulse noncoherent 
integration yielding a lidar dwell time (to send pulses and receive echoes per beam position) of 
100 ps. Further, because each array is connected to an independent laser source and receiver, all 
four arrays can transmit and receive in parallel. Finally, for such short detection ranges (0.4 km 
calculated for 10% object reflectivity), one wishes to effectively search the volume and update tracks 
every second. 

For each array that scans 90° in azimuth and from —0.1° to +0.4° in elevation per second, 
450,000 contiguous beams would be required. Further, at 100 us per beam dwell, it would take 
45 s per volume scan at all beam positions. In contrast, only 10,000 dwells are achieved per second 
at 100 us dwell intervals. To accomplish the volume update with many fewer beams (10,000 
versus 450,000), we recognize that the cross-range beam coverage at 500 m range is only 8.7 cm. 
Therefore, for detection of objects of interest that are, for example, 0.5 m wide and 2 m tall, 
we can choose to only transmit every fifth contiguous beam position in azimuth and every tenth 
beam position in elevation. In this manner, every transmitted azimuth beam position center will 
be separated by about 44 cm, and each transmitted elevation beam position will be separated by 
87 cm (Figure 26.27). 

Thus, a human, larger animal, or vehicle would be covered by at least one of the sparsely 
distributed beams. As a further hedge against an intruder slipping through the sparsely sampled 
volume, the system changes which of the continuous beam positions are covered with pulses every 
second, so that all beam positions have been covered over each 45 s with 10,000 available beams. 
By using only 2% of the 450,000 beam positions, we will therefore require 9000 per second per 
array to adequately search the volume per second, with all beam positions covered every 50 s 
(Figure 26.28). 

For detection, we develop a 3D clutter map, storing whether an echo was received (a “1”) 
or not (a “0”) at each pulse resolution cell for each of the 450,000 beam positions (only partly 
updated per second, but all ranges and positions updated every 50 s). We use a clutter map rather 
than Doppler detection to enable use of off-the-shelf, low-cost lidar systems. Even for expected 
detections of only 400 m, we assume detections could occur, for example, with higher reflectivity 
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Figure 26.27 Beam pattern for two 1 s frames at 0.5 km range. 
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Figure 26.28 Beam patterns and timing. 


objects, out to 1 km. Then, 1500 range resolution cells result per beam position and 450,000 beam 
positions per array and four arrays result in a clutter volume of 2.7 Gb. The memory of 2% of the 
memory cells is updated every second, with all cells updated each 50 s. The detection algorithm 
would determine physical motion by detecting changes in a number of contiguous cells over time 
(Figure 26.28). Once it is determined that a grouping of cells of comparable range and angle have 
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changed, the MFAL is directed to scan a 20 x 20 beam pattern of every other beam position over 
a 40 x 40 beam position area at that location as a priority interrupt from the search pattern. This 
would cover a 3.5 x 3.5 m cross-range at 500 m. Ifa significant portion of beam positions in this 
pattern receives echoes at about the same ranges, detection is declared. During each second from 
then on, a track update beam is scheduled at the center of the detected beam pattern. Updates 
are entered into a track filter for each track. If a track update return is not received over several 
seconds, another 20 x 20 beam acquisition is attempted. Therefore, of the 9000 array dwells 
identified for search each second, a multiple of 400 beams will be interrupted for each transition 
to track. 

When a detection has been determined and a track initiated, the node transmits the track state 
data as well as the estimated cross-sectional area of the intruder to the nearby nodes and to the 
remote monitor. At the remote monitor node, the cross-sectional area is an indication of intruder 
size, and the track velocity can indicate whether the intruder is potentially a vehicle traveling beyond 
human speed. The information is used by other MFAL nodes to cue an acquisition of the intruding 
object. This is of value in further verifying the detection and to maintain track by other nodes if 
the intruder passes out of sight or behind obstacles from the originally detecting node. Because of 
GPS position uncertainty much larger than lidar beam and range accuracy, a receiving node will 
provide special monitoring of a beam pattern covering the indicated location out to 5—10 m on 
each side in azimuth, depending on ambiguity calculations for the target and lidar geometry. In 
this special region, the clutter map detector is set to high detection and corresponding false alarm 
probabilities in that area based on the acquisition message of a neighboring node. If the cued node 
makes a detection and transition to a tracking process, it will send a message to neighboring nodes, 
indicating a detection associated with the track state received by the cueing node that sent the 
track state. In this way, a basic swarming behavior is established similar to the swarming intruder 
network in [13]. 


26.3.2.3 MFAL FSO Network Operations 


Because we have 10,000 dwells available and use 9000 for detection, acquisition, and tracking, 
as previously described, we shall reserve 1000 dwells, or 0.1 s per second available for each array, 
to communicate track and identification data via the FSO channel. Note that whereas we assume 
four lidar sources and receivers, one for each array, we assume only one communication source 
receiver, shared among the four arrays, with each individual array transmitting data only in the 
directions of other nodes. At 100 Mbps, a total of 10 Mb can be transmitted and/or received at a 
time. Assuming a simple error detection and correction code of 12 bits per information bit, this 
translates to 0.8 Mb of data per second per node. If 32 bits are used per word, then 26,000 words 
per second could be sent or received from each node in the allocated 0.1 s window each second. 
More detail is now provided concerning the unique FSO network. Typical successful FSO 
networks assume the need for high gain in transmission beams and reception beams via precision 
gimbaled mirror alignment, plus highly agile automatic turbulence correction and automatic gain 
control (AGC). FSO communication in the present concept requires no mechanical gimbals because 
the optical phased arrays provide electronic beam pointing. Also, at 1 km of communication range 
between neighboring nodes, it is sufficient that the transmitter aperture gain be used but significant 
receiver aperture gain is not needed. Finally, because of the short communication distances and low 
data rates, atmospheric turbulence compensation and advanced AGC features are not required. 
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A variety of network operations schemes can be devised for directive transmission and reception 
using combinations of time-division multiplexing (TDM) and wavelength-division multiplexing 
(WDM). A concept is offered here as evidence that a practical approach is feasible. For network 
initiation, each node is initially set for omnidirectional reception in which the arrays are spoiled 
to an approximate 0 dB gain (+3 dB). The monitor sends out interrogation beams, at low data 
rates, indicating GPS time and location and a node-responsive time window. The monitor beam 
sweeps in a 360° azimuth “interrogation” pattern. As individual nodes receive the interrogation, 
generally at different times, they respond with high gain transmit beams pointing toward the 
monitor node location within the indicated time window, during which time the monitor arrays 
are set to omnidirectional gain. A number of potential interrogation response WDM channels are 
available from which each node is randomly assigned for transmission of a response. The monitor 
can receive multiple responses at different WDM channels simultaneously for channel decoding 
during the response time windows. Alternatively, all nodes could use the same wavelength and, 
using GPS time synchronization, a TDM structure could be implemented in which each node 
takes turns communicating. It is expected that FSO communications will include appropriate FSO 
error detection and correction coding and a commercial data encryption product. The monitor 
will transmit interrogation beams followed by a listening time window for several cycles to ensure 
all nodes have responded. The monitor will then individually transmit to each node, all of which 
are set to omni receive, with a table of the locations of all reporting nodes in the network. From 
this point onward, both transmission and reception will be directional. 

During each second, with GPS time synchronization and position alignment, each node will 
perform surveillance, tracking, cued acquisition, and intruder type identification functions for the 
first 0.9 s and provide the remaining 0.1 s for transmission and reception of neighboring node 
acquisition cue track updates. During the 0.1 s window, all nodes will set their arrays to receive 
from their nearest neighbors in anticipation of a potential acquisition cue message, and those nodes 
with track cue data will transmit to immediate neighbors via their directive apertures. This time 
is also reserved for reporting detections, track updates, and identification images to the monitor 
node. The monitor node may also transmit a command to individual nodes or all nodes during the 
timeslot each second that is reserved for acquisition cues. 


26.3.3 Potential Next Steps 


The initial concept description has been provided for a new type of intruder detection system 
analogous to military microwave phased-array radar and communications systems. The description 
includes functions, timing, component descriptions, and initial calculations. Its feasibility hinges 
on the successful implementation and cost effectiveness of the new optical phased-array aperture 
originally described by Papadakis et al. [14] and further articulated here. Clearly, the prototype 
development of the phased array is the principal next step. Software functions, signal processing, 
sensor node alignment, communication protocols, and timing structure have been successfully 
developed for microwave systems and are not considered high-risk areas. Further, the application 
of interest is much less complex than that of the military microwave systems and would be expected 
to be straightforward to design [17]. 

As observed, the MFAL concept could be applied to other problems such as vehicle colli- 
sion avoidance and control, short-range inter-vehicle communication, and even surveillance and 
communication inside buildings where line-of-sight internode communications are possible. 


628 m Intelligent Sensor Networks 


26.4 Intersection between the Swarming “Pebbles” and the 
MFAL Surveillance Zone 


The two types of intrusion detection networks are very different; in fact, intentionally designed to 
have opposing attributes: 


1. Simple nodes versus highly advanced nodes 
2. Many randomly placed nodes, scalable to a wide area versus carefully placed and focused on 
a specific area 


Such diversity can provide a complementary means to detect intruders; however, having the 
networks interact is not straightforward. The swarming pebbles provide imprecise tracking of 
intruders and cannot distinguish the type of intruder using such simple sensors as microphones 
and IR detectors. In contrast, the MFAL zone is very precise and can determine type of intruder. 
Therefore, without design changes, the two networks cannot automatically cue and alert each other 
in the same or adjacent region. 

There are, however, advantages to designing them to interact. We will consider interaction in 
three scenarios in which a very large swarming pebbles field extends at the perimeter of a monitored 
area and the precise network of MFALs monitors an inner, high-priority perimeter, as shown in 
Figure 26.29. 


26.4.1 Intruder Leaving the Swarming Pebbles Field 
and Entering the MFAL Zone 


As described earlier, the track of an intruder from the monitor center is relatively imprecise. 
A remote monitor antenna beam with a 0.25 m? aperture would be expected to cover a rather 
large area, for example, 0.25 km? from a range of less than 1 km at 2.5 GHz. For 10 GHz pebble 
frequency, the same size aperture could cover a 0.25 mile? spot from about 10 km. Scanning the 
antenna could “centroid” the composite pebble signal as it scans to further reduce the ambiguity 


Figure 26.29 Configuration of an outer zone of swarming pebbles and an inner zone of MFAL 
nodes. Scenario A: Intruder leaving the swarming pebbles field and entering the MFAL zone. 
Scenario B: Intruder leaving the MFAL zone and entering the pebble field. Scenario C: Intruder 
in the midst of a surveillance zone containing both MFALs and pebbles. 
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area by standard techniques. However, regardless of approach, it is likely that the intruder track 
would not be more accurate than 100-200 m. Such low accuracy is helpful for large surveillance 
regions. However, it is too imprecise to cue an MFAL as an intruder tracked by pebbles leaves 
the pebble field and enters the MFAL zone. There are some design measures that could provide 
an advantage to the MFAL zone over merely cold detection by MFALs with no input from the 
pebble field. 

As described earlier, an MFAL will scan every fifth beam position over 360° azimuth, and every 
tenth beam in elevation, during every second. Each MFAL maintains a 3D “clutter map” and 
automatically monitors for consistent movement by pixel groupings. If such behavior is detected 
over a number of dwells, a moving object is declared to have been detected, and scanning and 
track update dwells are then scheduled from then on. For fast-moving vehicles, it is likely that 
an MFAL would detect such movement in only two rotations, i.e., within 2 s, because the pixel 
motion would be obvious. For slow-moving humans, an MFAL might require four rotations, or 
4 s, before enough pixel movement is determined to declare a “detection.” 

Even if the remotely monitoring command and control unit of the pebble field can send a track 
accurate to within 200 m for an approaching intruder, the closest MFAL would not be able to 
use the track to cue with sufficient accuracy to provide a significant time advantage. However, the 
MFAL could reset its 1 s rotation cycle to span the 200 m cross-range, thereby potentially saving 
a fraction of a second. And if the presence of the cue could be used to allow a more sensitive pixel 
movement detection threshold, then perhaps the MFAL could make a detection one cycle (second) 
sooner, say after three cycles rather than four. Therefore, direct cueing from the pebbles to the 
MFALs does not significantly enhance performance. 

If, however, the pebbles also produced a 1550 nm tone in addition to the microwave tone 
monitored by its neighbors and the central monitor antenna, then they could provide a beacon to cue 
the MFAL with high accuracy. For this design case, suppose that those pebbles bordering the MFAL 
zone carried optical tone emitters in the same wavelength band of the MFAL communications 
transceivers, but at a wavelength separate from those of the communications channels. The MFAL 
could be designed to receive the tone and perform a cued acquisition. 

To accomplish this, a design feature would also need to be made to the beam scheduling. 
As presently designed, the MFAL would not detect the tone until the MFAL trained a beam in 
the precise direction sometime within the 50 s full coverage. Otherwise, for every fifth beam per 
second, there would only be a 20% chance of alignment of beam and tone per 1 s scan cycle. A new 
mode would need to be added to the MFAL to allow it to passively sweep the horizon for beacon 
detections with a spoiled beam width with wider azimuthal spread. A lower gain, broader azimuthal 
laser beam would still allow adequate received signal strength and yet allow for faster scanning of 
the 360° of azimuth. If the azimuthal beamwidth were spoiled from the nominal 0.01°-0.1°, it is 
possible that within 1 ms the beam could be scanned continually through the full 360° horizon, to 
receive, detect, and locate any optical pebble tone. The beacon could then be used to self-cue the 
MFAL to acquire the emerging intruder with the beam spread mentioned in the previous section. 
Adding such a feature only requires additional functionality to be programmed into the MFAL 
control element. As for the pebbles, at least those near the boundary with the MFAL zone would 
need to be specially manufactured to emit not only microwave cue tones, but also optical cue tones 
of sufficient strength for nearby MFALs. 

In summary, the required design changes for pebble field cueing of MFALs are 


m Optical tone capability in pebbles near the boundary allowing dual microwave and optical 
transmission upon detection of an intruder 
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m Optical cue tone beacon reception via the MFAL communication transceiver when the 
scanning beam is pointed at the pebble 

m Spoiling of the MFAL receive beam in the horizontal direction to facilitate rapid scanning 
of the horizon for beacon signals 

m Addition of a beacon scanning mode in the MFAL to scan 360° for pebble beacon tones 
once per second 


With these design changes, an intruder moving from the pebble field into the MFAL field could be 
acquired and identified by the nearest MFAL in less than 2 ms, which is the time of a passive scan 
(1 ms) plus the time to send out the 20 by 20 track beam acquisition pattern (0.4 ms) during a 
single one-second time sweep. This latency of less than 2 ms to less than 1 s, depending on beacon 
scanning timing, is in contrast to uncued detection and transition to track of an MFAL of 2-4 s. 
Such a cueing design measure is likely most useful for high-speed intrusion such as a vehicle. 


26.4.2 Intruder Leaving the MFAL Zone and Entering the Pebble Field 


In this case, the intruder track is very accurate and could, in principle, be used to cue a limited 
set of pebbles in the direction that the intruder is headed. Again, as for the aforementioned case, 
special design measures would be needed for cueing to minimize tracking discontinuity during the 
time required to reestablish a pebbles track. For example, without track continuity, it is possible 
that an intruder would penetrate tens of meters into the pebbles field before a new pebbles track 
was established. If the intruder changes direction abruptly during that time, it is possible that the 
new track would not be considered the same object as the previous precise MFAL track because of 
the tracking gap in combination with a change of direction and accuracy differences. 

Beginning with the simplest design option of the MFAL sending its precise track coordinates to 
the swarming pebbles field command center, the monitor antenna could send a command signal 
to the nearest pebbles to the MFAL track to cause them to trigger to the greater cued detection 
sensitivity. This would increase the probability of early detection as the intruder enters the field 
to minimize the tracking gap. However, with such a large command/monitor antenna footprint, 
nominally 0.25 km”, so many pebbles could be commanded to cued detection sensitivity that a 
false alarm instability could be triggered, as mentioned earlier. 

For the selected cold and cued detection settings of the pebbles identified previously, it is desired 
that only pebbles within about plus or minus 10 m of the last MFAL coordinates for the intruder 
be set to cued detection sensitivity as the intruder enters the pebble field. To match such a narrow 
20 m wide sector covered by the MFAL beam from 500 m range would require a microwave 
antenna beam at the MFAL location to cue a narrow swath of pebbles with only a 2.3° azimuthal 
beam width. Even at the relatively high microwave frequency of 10 GHz, this would require an 
unacceptably large antenna. 

Therefore, as for the previous case, an optical solution is warranted. By implementing a simple 
optical receiver and wide field-of-view aperture in the pebbles at the edge of the pebble field 
bordering the MFAL zone, they would be able to directly receive a precise optical cue tone. The 
MFAL could emit a cueing frequency via its communication transmitter at a uniquely different 
wavelength from its radar and communications band, as for the previous situation. There would 
be sufficient gain for the MFAL to spoil its 0.01° beam to, perhaps, 0.1°, as mentioned earlier, 
and briefly (within a millisecond) transmit the cue tone for a sequence of beams over the 2.3° area 
where the MFAL track enters the pebble field. This could serve to only cue those pebbles most 
likely to detect the intruder, and perhaps maintain track continuity. 


A New Generation of Intrusion Detection Networks m 631 


26.4.3 Intruder in the Midst of a Surveillance Zone Containing 
Both MFALs and Pebbles 


As identified earlier, it may be of advantage to extend the pebble field into the MFAL zone for the 
purpose of ensuring all-weather tracking capability. In this manner, when fog or rain attenuation 
reduces the MFAL range, the pebble field tracks can maintain surveillance coverage in the inclement 
weather. The pebble field track accuracy is orders of magnitude less accurate than the MFAL track 
accuracy, however. Therefore, even when both colocated networks are in operation simultaneously 
prior or after the inclement weather, it is only likely that the tracks of one will automatically 
correlate at a command/control center with the tracks of the other network if intruding objects are 
separated more widely than the accuracy of the less accurate tracking network, the pebble field. 
Because of this, it is probably the most expedient to only use the MFAL tracks in good weather 
and only resort to pebbles field tracks in severely degraded weather. 


26.5 Autonomous Mobile Tracking Sensor Nodes 


Especially for the MFAL network, the individual terminals are sufficiently compact to make them 
attractive candidates to mount on small autonomous vehicles. A notional view of such a vehicle is 
shown in Figure 26.30, using an unattended ground vehicle chassis. In this way, the mobile MFAL 
nodes could also be designed to automatically run an intercept course to engage an intruder for 
purposes such as (a) forcing a halt to the intruder, (b) providing a warning, or (c) furnishing a 
video/audio feed to the command center to visually identify and communicate. One of the issues 
that would arise is that there would need to be net-wide coordination. A completely autonomous, 
uncoordinated set of MFAL nodes, all capable of moving toward an intruder, or multiple intruders, 
could leave coverage gaps in the network while they intercepted. 

Coordination could be performed via the optical communications channels. Coordination 
options could include (a) the first node to detect an intruder moves to make the interception or (b) 
the unit that has the shortest time required to make the intercept along the intruder trajectory moves 
to complete the interception. This could be a centrally commanded operation, or, alternatively, 
each node could perform its own intercept calculations and communicate to others to confirm its 


Figure 26.30 Example of mobile MFAL vehicle. 
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intent to intercepts. It may be best to include several such intercept logic options as one may be more 
useful than another depending on the behavior of the intruder. For example, a high-speed vehicle 
that is moving faster than an intercepting node can travel would motivate the first-to-intercept 
logic over first-to-detect. 

The network would also be required to have reserve coverage capability to compensate for a 
node moving out of position to make the intercept. Alternative approaches include (a) packing the 
sensor nodes more tightly than otherwise required, so there is more sensor overlap or (b) placing 
reserve mobile nodes in the zone to fill in for gapped positions. In the latter alternative, there would 
need to be a sufficient number of reserve nodes so that the time lag for the closest such reserve unit 
to move into a gapped location is acceptable without compromising coverage continuity. 
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Edge monitoring, RM, 166-167 
EDPA, see Energy-aware dynamic proposal algorithm 
(EDPA) 
Effcient XML interchange (EXT), 70 
Eigensystem realization algorithm (ERA) 
Markov parameters, 83 
procedures, 84 
Welch's method, 84 
E/L-aware algorithms, see Energy and lifetime-aware 
algorithms 
Electronic product code (EPC) 
bits prediction and classification, 542-543 
encoding schemes analysis, 540 
Energy and lifetime-aware (E/L-aware) algorithms, 
145-146 
Energy-aware (E-aware) algorithms, 145 
Energy-aware dynamic proposal algorithm (EDPA), 126 
Energy-efficient networking protocol design 
exploring link correlation, 413-417 
low-duty-cycle, 412-413 
Moore's law, 411 
opportunistic routing, 417-422 
WSNs, 411 
Energy-efficient opportunistic routing protocol design 
broadcasting, 417 
data delivery phase, 419 
EPR, see E2R protocol 
ExOR and OPRAH, 418 
fixed-forwarder and probabilistic approaches, 417 
RMD, see Route metric discovery (RMD) 


Energy network behavior, 346 
EPC, see Electronic product code (EPC) 
ERA, see Eigensystem realization algorithm (ERA) 
Erika enterprise (EE) OS, 262-263 
E?R protocol 
data delivery, 422 
maintenance state, 419-420 
RMD, 420-421 
Error analysis 
J48 tree classifier, 46, 48 
Naive Bayes, 46, 48 
SVM linear classifier, 46, 48 
testing set, SVM, 46, 48 
training and testing set, Bayes network, 46, 49 
ESP, see Extensible sensor stream processing (ESP) 
Event detection, WSNs 
centralized data processing vs. decentralized event 
detection, 442 
definition, 442 
detection accuracy 
distributed event detection, different deployments, 
454 
events, 453-454 
energy consumption 
ACC-Logic, 456 
distributed event detection and recognition, 455 
event processing, 455 
and network lifetime, system configurations, 
455-456 
exemplary platform, see AVS-Extrem process 
one-/multi-dimensional motion pattern, 443 
physical sensory domain transformation, 443 
state of the art, see State of the art 
EXI, see Efficient XML interchange (EXI) 
Exploring link correlation, energy-efficient networking 
protocol design 
ACKs, 415-416 
CF, 415 
CPRP, 414-415 
flooding algorithms, 415 
MICAz nodes, 414 
open parking lot and indoor office, 414 
performance, outdoor linear network experiment, 
416, 417 
PRR, 414 
Ps(N;|N3) and ACKs, 416 
quality, individual links, 413-414 
RBP, 416-417 
Extensible markup language (XML) 
compression, 70, 71 
defined, 68 
file, 68, 70 
Extensible sensor stream processing (ESP) 
architecture, 308 
metaphysical independence layer, 308 
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pipeline approach, 307 

programmable stages, 307 

sensor data cleaning infrastructures, 307 
SMURF, 308 

and STREAM, 307 


F 


Facet chain algebra, 99-101 
False alarm monitoring, 603-604 
False-negative cleaning 
analysis phase 
tag stream, 548 
visual representation, 548 
Bayesian network, 554 
comparison experiment, 555-556 
description, 547-548 
experimental evaluation 
assumptions, 553 
CDL formulae, 553 
database structure, 553 
environment, 553 
RFID anomalies, 553 
scoring system, 553 
intelligence phase 
ANN, 549 
input and output, 549-550 
NMR classifier, 550 
permutation 2 and 3 logic engines, 550, 551 
permutation 4 and 5 logic engines, 550, 552 
permutation 1 logic engine, 550 
permutations formations, 549 
loading phase, 552 
neural network, 554 
NMR, 554-555 
False-positive cleaning 
classifier phase 
Bayesian network, 557-558 
neural network, 558 
NMR, 558, 559 
experimental results and analysis 
Bayesian network, 560 
comparison experiment, 561-562 
environment, 560 
neural network, 560-561 
NMR, 561 
feature set definition phase, 556-557 
modification phase, 559 
Fast infoset, 68 
Fault tolerance and errors, 474-475 
FC, see Fusion center (FC) 
Federal Communications Commission (FCC) 
measurements, 207 
Finger pulse oximeter, 583-584 
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Fire activities and sensor measurement 
algorithm complexity, 37 
BA, see Burnt area (BA) 
training values estimation, data, 36-37 
Fire weather index (FWT) 
classes, 39-40 
fire classification performance, ranking, 40 
F-score, 40, 41 
Fit ratios (FRs), 286-287 
Floyd-Warshall algorithm, 464 
Frame differencing (FD) enforcement, 249 
Frequency domain methods 
advantage and disadvantages, 273 
CMIF, 273-275 
FREs, 272-273 
polynomial coefficients, 273 
rational fraction polynomial and peak picking, 273 
SDOF and eFRF, 275 
Frequency response functions (FRFs), 272-273, 290-292 
ERES, see Frequency response functions (FRFs) 
FRs, see Fit ratios (FRs) 
Fusion center (FC), 224 
FWI, see Fire weather index (FWI) 


G 


Gait context extraction 
detection system, 185 
multicluster sensing, 185 
testing phase, 186 
training phase, 185-186 
Gait context identification 
PCA signal projection 
contextual bases, principal components analysis, 
195, 196 
feature clusters, 195, 196 
probabilistic matrix factorization model, 196-198 
Gait sensor networks, see NDDs, gait sensor networks 
GAP, see Generalized assignment problem (GAP) 
Gauss—Markoy signal correlation model, 228-229 
Gauss—Markoy signals, 240 
Generalized assignment problem (GAP), 137-139 
Gradient-type algorithm, see Primal-dual interior point 
method (PDIP) 
Greedy algorithm, see Centralized algorithm, 
sensor-mission assignment 
Greedy-type algorithm, see Orthogonal matching pursuit 
(OMP) 


H 


HCPs, see Hidden context patterns (HCPs) 
HF, see High frequency (HF) 
Hidden context patterns (HCPs) 


Bernoulli Log-Likelihood, 188 
binary pattern detection, 188 
detection and missing probability, 187 
gait identification problem, 186 
gait pattern models, 186 
and OSD interpretation, 187 
Hidden Markov model (HMM) 
dependency assumptions, 11 
disadvantage, 12 
probability distributions, 11 
Hidden semi-Markov model (HSMM), 12 
Hierarchical sensor network 
description, 506 
estimation error difference, 507-508 
initial topology and after 250 s, 506, 507 
operational events, 506 
topology, 506 
High frequency (HF), 574 
HMM, see Hidden Markov model (HMM) 
HSMM, see Hidden semi-Markov model (HSMM) 


PC, see Inter-integrated circuit (PO) 
ICD, see Implanted cardiac defibrillator (ICD) 
IDT, see Incumbent detection threshold (IDT) 
Image processing, parking space occupancy detection 
algorithm pseudocode, 254—256 
background modeling, 250-252 
BS approach, WMSNs, 249 
CI, see Confidence index (CI) 
parking space status analysis, 249-250 
status change detection 
acquisition time equal, 1 fps, 253 
BS and FD approach, 251-253 
frame rates, 253 
Implantable Doppler flowmeter, 584, 585 
Implanted cardiac defibrillator ICD), 578-579 
Incumbent detection threshold (IDT), 212 
Information retrieval (IR), 32, 33, 35 
Integer program (IP), 120, 136, 146 
Inter-integrated circuit (12C), 58, 59 
Internet protocol (1P) 
data representation methods, 68, 70-71 
functionality, 71-73 
web service stack, 66-70 
Intruder blockage detection analysis 
blockage-sensing transmission, 598-599 
description, 598 
diffraction calculations, 599 
and false alarm analysis, 600 
false alarm monitoring, see False alarm monitoring 
illumination frequencies, 601 
microwave band, 599 
network sensitivity, 601-603 


network stability, 600 
power reduction vs. distance, 599 
probability vs. SNR, 20 dB signal case, 600, 601 
pulsed system, 601 
test tones, gain and sensitivity control, 603 
thresholds, 600-601 
two-dimensional plot, power reduction, 599-600 
Intrusion detection networks 
description, 590 
intersection, see Swarming pebbles and MFAL 
surveillance zone 
low-power microwave tones, 590 
MFAL nodes and FSO, 590-591 
MICAz, 590 
multifunction array lidar network, see Multifunction 
array lidar network 
tone-based swarming detection network, see 
Tone-based swarming detection network 
IP, see Integer program (IP); Internet protocol (IP) 
IR, see Information retrieval (IR) 
ISO 18000-6c and ISO 18000-7, 581-582 


J 


JN, see Joint node (JN) 

Joined Q-ary tree, 541-542 

Joint node (JN), 233 

Joint sparsity models (JSMs), 390 
JSMs, see Joint sparsity models (JSMs) 


K 


Kalman filters 
attacks detection, 403—405 
RM, 165 
K-means clustering, 18-19 
K-nearest neighbor (k-NN) algorithm, 15-16 
Knowledge representation (KR), see KR and reasoning, 
resilient sensor networks 
Kolmogorov-Smirnov test (K-S test), 343 
KR and reasoning, resilient sensor networks; see also 
Reasoning algorithm 
ASCs, 95 
binary predicates, 99 
facet chains, 99-103 
m-ary predicate, 98-99 
path algebra, 97-98 
sensing system design, power systems, 104-112 
simplicial complexes, 95-97 
SMS, 94-95 
target complex, 103-104 
K-S data behavior, 347 
K-S test, see Kolmogorov-Smirnov test (K-S test) 
Kullback-Leibler (KL) divergence, 190 
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L 


Least mean square (LMS) filter, 164, 166 
Lifetime and energy-efficient coverage, 474 
Lifetime-constrained environments, sensor-mission 
assignment 
achieved profits fraction, 147-149 
characterization, partial profit, 145 
dynamic problem, 143 
effect, initial battery lifetime, 149-150 
energy and lifetime-aware algorithm, 
145-146 
energy-aware algorithm, 145 
extant nodes fraction, 148, 149 
factors, 144 
fraction, alive nodes, 149, 150 
LP relaxation, 146, 147 
mission leader, 144 
preemption, 144 
simulation setup, 147 
Linearized Saint-Venant model, 401 
Linear principal regression (LPR), 194 
LLR, see Log-likelihood ratio (LLR) 
LMS filter, see Least mean square filter 
Local evaluation 
definition, 443 
human motions recognition, 443 
Localized algorithms coverage, 473-474 
Logical relationship (LR), R£SP protocols 
average transmission energy to communicate, 234 
maximum performance-gain-to-energy ratio, 234 
node and parent-child links, 234 
TR and source-destination pair, 234 
ZigBee networks, 234-235 
Log-likelihood ratio (LLR), 224 
Lossless compression, 382 
Lossy compression, 382 
Low-duty-cycle network protocol design 
communication energy and delay, 412 
DSF, 413 
link-layer designs, 412-413 
packet reception, 412 
physical-layer transmission rate scaling, 412 
system-wide dissemination, configurations and code 
binaries, 413 
Low frequency (LE), 574 
Low-power solutions, WPSNs 
data-driven architecture design flow methodology, 
525-526 
description, 517 
distributed architecture design, see Distributed 
architecture design 
low-power circuit techniques, WSNs, 517-518 
novel low-power data-driven coding paradigm, see 
Novel low-power data-driven coding paradigm 
LPR, see Linear principal regression (LPR) 
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M 


Machine learning algorithms 
classification, 5, 6 
defined, 3-4 
semi-supervised learning, 24-25 
supervised, see Supervised learning 
unsupervised, see Unsupervised learning 
Machine learning (ML) algorithms 
attribute selection, 45 
Naive Bayes, 42 
tree classifier, 44 
user query, 42-44 
MAP, see Maximum a priori (MAP) 
Markov chain-based parking space model, 249-250 
Markov models, supervised learning algorithm 
HMM, see Hidden Markov model (HMM) 
HSMM, see Hidden semi-Markov model (HSMM) 
Markov parameters, 83-85 
MASs, see Multiagent systems (MASs) 
Maximum a priori (MAP), 191, 192 
MDI, see Metaphysical data independence (MDI) 
Medium access control (MAC) protocols, 387 
Metaphysical data independence (MDI), 308-309 
Micro mesh routing (MMR) protocol 
description, 450-451 
RREQ, RREP and RERR, 451 
ML algorithms, see Machine learning algorithms 
Mobile sensors coverage 
description, 470 
optimal repositioning, see Optimal repositioning, 
mobile sensors coverage 
sweep, see Sweep coverage 
Modal parameters, WSN-based SHM system 
identification, 83 
mode shapes, cantilevered beam, 82, 83 
natural frequencies, 82 
Model-driven data acquisition 
base station, 159 
communication savings, 161 
cost and accuracy, 159 
defined, 158-159 
multivariate Gaussian, 160-161 
optimization loop, 159-160 
MRGAP, see Multi-round generalized assignment 
problem (MRGAP) 
MRPA, see Multi-round proposal algorithm (MRPA) 
Multiagent systems (MASs) 
capabilities and interactions, WSNs, 436 
description, 427 
legacy information system, 430 
radio antennas, cellular network, 431 
Multifunction array lidar (MFAL) network 
advantages, 622 
characteristics, 622 


communications mode, 623 
and communications signals, 624 
description, 617, 627 
design parameter values, 622, 623 
FSO network operations, 626-627 
GPS-based lidar alignments, 621 
intruder detection swarming network concept, 618 
modes, network nodes, 620 
network operation, approaching intruder, 620, 621 
node details, 619 
notional electronic processing and components, 620 
optical layers, 619 
passive IR imaging, 618 
performance and fabrication characteristics, 618 
problems, 627 
scattering, 619-620 
surveillance and acquisition, 624-625 
tracking and identification, 625-626 
Multi-objective coverage, 473 
Multi-round generalized assignment problem (MRGAP) 
advantages, 138 
GAP instance, linear program, 138 
SMD, 138 
Multi-round proposal algorithm (MRPA) 
defined, 122-123 
mission leaders and sensors, 123 
runtime and message complexity, 124 
Multi-scale SHM, 88 
Multi-sensor systems 
correlated signal models and system equations, 
361-363 
correlations types, 359 
DCS, 361 
description, 359 
intra-and inter-sensor correlations, 360 
intra-sensor correlation, 360 


N 


Naive Bayes, machine learning algorithms, 42 
Natural frequencies, 81, 82 
NCAP, see Network-capable application processor 
(NCAP) 
NDDs, gait sensor networks 
binary generation procedure, 183 
computation, similarity scores, 195 
context awareness model, 186-194 
decision rule, 195 
gait context identification, 195-198 
gait recognition system, 182 
input, context filter, 183, 184 
LPR, see Linear principal regression (LPR) 
NME, smoothness/sparseness constraints, 198, 200 
principle, gait context extraction, 184-186 


principle, ¡SMART system, 182 
pseudo-random field, visibility modulation, 200-202 
ROIs, 184 
Network-capable application processor (NCAP) 
defined, 55 
features, 72 
structure, smart transducer, 56, 57 
transducer interface, 57 
Network load and sensor lifetime 
highest network load, 157 
periodic data collection, 157 
radioactivity, 156 
routing trees, 156, 157 
Neuro-disorder diseases (NDDs), see NDDs, gait sensor 
networks 
Neyman-Pearson criterion, 230 
NMR, see Non-monotonic reasoning (NMR) 
Non-monotonic reasoning (NMR) 
classifiers, 539 
false-negative experiment, 554-555 
false positive, see False-positive cleaning 
Nonnegative matrix factorization (NMF), see Probabilistic 
NME model 
Novel low-power data-driven coding paradigm 
data-driven decoder architecture, 520-521 
pulse width coding scheme, 518-520 


O 


Observed sensing data (OSD), 186-187 
OMP, see Orthogonal matching pursuit (OMP) 
Optimal repositioning, mobile sensors coverage 
description, 471 
min-max problem, 471-472 
min-sum problem, 472 
Orthogonal matching pursuit (OMP) 
algorithm, 363, 364 
inputs and outputs, 363 
k steps, 363-364 
SOMP, 364, 365 
sparse set, 364 
OSD, see Observed sensing data (OSD) 


P 


Packet reception ratio (PRR), 414 

Parametric data analysis, modal models 
advantages, 270 
ambient vibration, see Ambient vibration data analysis 
frequency domain, 272-275 
modal parameters and damage detection, 276-279 
physical quantities, 270 
time domain, 271-272 
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Parking space occupancy detection algorithm pseudocode, 
254-256 
Parking space status analysis 
Markov chain, 249-250 
ROI, 249, 250 
Parts-based approach, see Context awareness model 
Path algebra, 97-98 
Pattern recognition algorithms, 446 
PCT, see Probabilistic cluster-based technique (PCT) 
PDF, see Probability density function (PDF) 
PDIP method, see Primal-dual interior point (PDIP) 
method 
Peer-to-peer communication, 233 
Performance evaluation, CS 
correlation signals, 368 
function, SNR, 372-374 
joint vs. separate recovery, correlation degree, 374-375 
notation used, 367 
reconstruction, function of sparsity, see Sparsity 
function 
sensors and measurements relationship, 370-372 
SNR, 367 
SOMP algorithm, 368 
used metrics, 367 
Performance of signal processing (PoSP), 224 
Perimeter coverage 
L-local barrier coverage, 468-469 
problem formulation, 465-466 
strong k-barrier coverage, 466-467 
weak k-barrier coverage, 467-468 
POI, see Points of interest (POI) 
Points of interest (POD), 470 
PoSP, see Performance of signal processing (PoSP) 
Prediction-based data collection, WSN 
aggregative approaches, 169-176 
data accuracy, 157-158 
definitions, 155-156 
learning schemes, 176, 177 
model-driven data acquisition, 158-161 
network load and sensor lifetime, 156-157 
RM, see Replicated models (RM) 
Primal-dual interior point (PDIP) method 
algorithm, 366 
Lagrangian function, KKT conditions, 365 
Lı minimization, 364 
mapping function, 365-366 
Taylor expansions, 366 
Probabilistic cluster-based technique (PCT) 
description, 544 
efficiency calculation, 544 
frame-size, 544 
rules, 545 
Probabilistic matrix factorization model 
feature clusters, 198 
ROC graphs, PCA and NMF, 198, 199 
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Rol, NME, 196, 197 

similarity score distributions, 197-199 
Probabilistic NMF model 

calculation, convenience, 189 

cost function, 190 

Gaussian distribution, 190 

KL divergence, 190 

Log-Likelihood, 189 

Poisson distribution, 189 
Probability density function (PDF), 228 
PRR, see Packet reception ratio (PRR) 


Q 


QoS, see Quality of service (QoS) 
Quality of service (QoS), 208 
Query log approach, 51 


R 


Radio-frequency identification (RFID) 
active and passive, 570-571 
anomalies, 534 
anti-collision techniques, see Anti-collision techniques 
bar code technology, 569-570 
Bayesian networks, 531 
classification, anomalies, 530 
classifiers, see Classifiers, RFID 
collision types, 534 
communication protocols, 574-575 
current applications, health care, 572 
data cleaning model, 309 
data stream, 308 
deferred cleaning, see Deferred cleaning approaches 
description, 530 
deterministic anti-collision protocols, 535 
false positives and negatives, 307 
format, observations, 533-534 
Friis transmission equation, 571-572 
interrogator, 571 
memory, 570 
middleware platform, 309 
NMR, 531 
probabilistic anti-collision protocols, 536-537 
radio waves, 570 
standards and spectrum utilization, see RFID standards 
and spectrum utilization 
statistical sample, tags, 308 
system architecture, 532-533 
tag, 571 
tracking applications, 570 
typical applications, 572 
Random decreament technique (RDT), 85-86 


RBP, see Reliable broadcast propagation (RBP) 
RDT, see Random decreament technique (RDT) 
Real-time coverage 
Floyd-Warshall algorithm, 464 
operation, 464 
trade-offs, quality and duty-cycle, 464 
trapping, 465 
Real-time model 
ICD, 578-579 
medtronic ICD, 579 
optional communication channel, 579 
RFID tag and interrogator, 579 
Reasoning algorithm 
agreement, composite team, 112 
agreement test, 111 
cascading failures, 112 
hypothetical sensor failures, 110 
TEEE-14 bus system, 110 
target complex, 103-104 
Received signal strength indicator (RSSI) 
evaluation, 445 
Receiver operating characteristics (ROC) 
graphs, PCA and NMF, 198, 199 
NME, sparseness and smoothness constraints, 
198, 200 
regular and pseudo-random visibility modulations, 
200, 202 
Recovery algorithms, CS 
OMP, see Orthogonal matching pursuit (OMP) 
PDIP, see Primal-dual interior point method 
Reduction tools 
applications, 335 
description, 330 
network infrastructure, 335 
reduction API, 334 
sampling algorithm, 337-341 
sketch algorithm, 341 
wavelet-based algorithm, 335-337 
Region of interest (Rol), 183, 184, 249, 250 
Reliable broadcast propagation (RBP), 
416-417 
Remote-controlled pacemaker programmer, 
584-586 
Replicated models (RM) 
defined, 161-162 
spatial modeling, 166-168 
temporal modeling, 162-166 
Request-to-send (RTS), 232 
RERR, see Route error (RERR) 
Responsive surface methodology/design of experiment 
(RSM/DoE), 215 
REID, see Radio-frequency identification (RFID) 
RFID standards and spectrum utilization 
classification, 573, 574 
defined, 572-573 


EM, LF and HE, 574 
human skin and tissue, 573 
ISO/IEC 18000 family, 573 
U.S. and International standard bodies, 573 
R£SP, see Routing for signal processing (RfSP) 
RM, see Replicated models (RM) 
RMD, see Route metric discovery (RMD) 
RMREP 
packet, 418-419 
propagation, 421 
RMS, see Root-mean-square (RMS) 
ROG, see Receiver operating characteristics (ROC) 
Rol, see Region of interest (ROI) 
Root-mean-square (RMS), 538 
Route error (RERR), 451 
Route metric discovery (RMD) 
packets dissemination, 420-421 
repeated flooding, 418 
RMREP, see RMREP 
Route reply (RREP), 451 
Route request (RREQ), 451 
Routing for signal processing (RfSP) 
advantages, 227 
associated performance metrics, 237 
chernoff routing, 228-229 
combinatorial optimization routing, 229-231 
C-R£SP optimization, 235, 236 
data-centric routing, 224 
design local fusion rules, 227 
destination node, 236 
development, link metric, 227 
distributed, 226 
D-R£SP, 227, 231-235 
energy inefficiency, 226 
estimation variances 
changing energy constraint, 238, 239 
changing sensing range, 238-240 
estimation-gain-to-energy ratios, 236 
Gauss—Markov signals, 240 
hierarchical protocols, 224 
joining node and potential parent, 236 
LLR and FC, 224 
MATLAB®, 235-236 
measurements, 225 
metric, 227 
one instance, network and routes, 236—237 
PoSP, 224 
signal processing problem, 225-226 
splitting, 227 
WSNs, 223-224 
ZigBee networks, 236 
RREP, see Route reply (RREP) 
RREQ, see Route request (RREQ) 


RSM/DoE, see Responsive surface methodology/design of 


experiment (RSM/DoE) 
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RTS, see Request-to-send (RTS) 
Runtime adaptivity approach, 499-502 
Runtime reconfiguration, 502-504 


S 


Saint-Venant model, 399-400 
Sampling algorithm 
central sampling, 339 
overall complexity, 340-341 
random sampling, 339 
steps, 338 
SDOF, see Single-degree-of-freedom (SDOF) 
SEED-EYE camera network 
board development, IPERMOB project, 261, 263 
EE OS, 262 
execution time distribution, 262-263 
high-demanding multimedia applications, 261 
Self-organizing distributed state estimators 
algorithm’s complexity, 496 
communication demand, elements, 496, 497 
computational demand, floating point operation, 497 
data exchange, 485 
design alternatives, 499 
diffusion process, see Diffusion process 
distributed KF, see Distributed Kalman filtering (KF) 
dynamical system architectures, 485 
energy, 484 
global state vector estimation, 485 
implementation, runtime reconfiguration, 502-504 
large-scale dynamical process, 499 
limited communication, 484 
matrix computations, 496-497 
mesh and star network topology, sensor nodes, 484 
model-based design, 497-498 
notation and preliminaries, 486 
problem formulation, 486-487 
runtime adaptivity approach, 499-502 
sensor measurements, 484 
signal processing and implementation, 498 
static sensor networks, 496 
task model, 498-499 
trade-offs, 485 
Self-organizing maps (SOMs), 20-21 
Semi-Markov conditional random fields (SMCRFs), 14 
Semi-matching with demand (SMD), 119, 137, 138 
Semi-supervised learning algorithms, 24-25 
Sensing system design, power systems 
IEEE 14 bus system, 105 
KR, 107-109 
reasoning algorithm, 110-112 
redundancy and agreement, sensors, 106-107 
sequences, fault events, 105 
strategy chains, redundancy checks, 108 
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Sensor clustering 
ARX, 298-299 
time series model, 284-287 
Sensor interfaces and data format 
components, 57 
networking, sensor node, 65-73 
SNs, communication model, 56 
structure, smart transducer, 55, 56 
TIM, see Transducer interface module (TIM) 
Sensor management system (SMS), 94, 105, 112 
Sensor-mission assignment 
budget-constrained environments, 134-142 
centralized algorithm, 121 
centralized approach, 118 
defined, 116-117 
distributed algorithms, 121-126 
dynamic problem 
average achieved profits, 131, 133 
calculation, awarded profit, 121 
exchanged messages, 131, 133 
messages, EDPA, 132, 135 
network lifetime, 132, 134 
trace, network performance, 131, 132 
lifetime-constrained environments, 143-150 
network model, 118 
profit functions, 118 
sensor selection, 117 
static problem 
achieved profits, 127, 128 
achieved rounds vs. profits, 130 
assigned sensors, 129 
fraction, satisfied missions, 129, 130 
IP, 120 
messages, 127, 128 
messages vs. rounds, 130, 131 
profit functions, 120 
SMD model, bipartite graph, 119 
utility contribution, 127 
Sensor networks (SNs) 
analog sensor identification, 64 
communication model, 56 
and coverage 
information security, 460 
Internet and WWW protection, 460 
multiple objectives and/or constraints, 461 
physical and social security, 460 
technology push and application pool, 460 
IP, 66-73 
plain digital sensor plug-and-play mechanisms, 64, 65 
Sensor node hardware, AVS-Extrem, 448-449 
Sensors and unreliable data modeling 
alarm ranking function, 37-41 
correlation, attributes, 49-50 
data mining, 33-35 


F-measure performance, 51-52 


IR, 33 
machine learning algorithms, 41-45 
measurement and fire activity, 35-37 
preprocessing event log, 32 
relevance-based ranking function, 32 
simulation 
analysis, 46-47 
error analysis, see Error analysis 
WEKA, 45-46 
weather data, 50 
Sensor stream characterization 
application requirements, 332 
data sensor, 331, 333 
description, 330 
reduction algorithms, 333 
reduction architecture, 331 
specific architecture, 333-334 
stream information, 333 
Sensor stream reduction 
characterization, see Sensor stream characterization 
conception 
description, 331 
sampling and sketch algorithms simulations, 
345-347 
wavelets algorithm simulation, 344-345 
hypothesis used, 329 
phases, 329-331 
reduction tools, see Reduction tools 
robustness evaluations 
description, 330-331 
elements, 342 
Kolmogorov-Smirnov test (K-S test), 343 
methodology of validation, 342-343 
network optimization, 342 
Serial peripheral interface (SPI), 59-60 
SHM, see Structural health monitoring (SHM) 
Signal-to-noise ratio (SNR), 372-374 
Simulation analysis, training set evaluation 
confusion matrix, J48 tree classifier, 46, 48 
J48 tree classifier, 46, 47 
Naive Bayes, 46 
SVM linear classifier, 46, 47 
Simultaneous orthogonal matching pursuit (SOMP) 
algorithm, 364, 365 
inputs and outputs, 364 
signal reconstructed, 369 
Single-degree-of-freedom (SDOF), 275 
Singular value decomposition (SVD), 272 
Sketch algorithm 
overall time complexity, 341 
pseudocode, 341 
steps, 341 
Smart cameras 
CITRIC and CMUcam3, 248 
Cyclops and Mesheye, 248 


performance, vision-based algorithms, 246 
platforms characteristics, 246, 247 
Vision Mesh, 248 
WiCa, 247 
Smart transducer interfaces and IEEE 1451 standard 
IEEE 1451 TEDS formats, 61, 63 
NCAP and TIM, 60-61 
unified web service, 61, 62 
SMCREs, see Semi-Markov conditional random fields 
(SMCREs) 
SMD, see Semi-matching with demand (SMD) 
Smoothing component 
average filter, 316 
module identification, 314 
tap-exchange, 317-318 
temporal filter, 315-316 
SMS, see Sensor management system (SMS) 
SMURF 
MDI-SMURF, 309 
RFID data stream, 308 
smoothing filter, 308 
smoothing window size, 308 
spatial-SMURF, 309 
temporal-SMURF, 309 
SNR, see Signal-to-noise ratio (SNR) 
SNs, see Sensor networks (SNs) 
SOMP, see Simultaneous orthogonal matching pursuit 
(SOMP) 
SOMs, see Self-organizing maps (SOMs) 
Sparsity function 
measurement and, 370 
M/K, 370 
probability, reconstruction, 369 
sensors, 370 
signal reconstructed, PDIP algorithm, 368 
signal reconstructed, SOMP, 369 
Spatial modeling, RM 
clique models, 168 
edge monitoring, 166-167 
Spectrum holes, 207-208 
SPI, see Serial peripheral interface (SPI) 
SSI, see Stochastic subspace identification (SSI) 
Stability-plasticity dilemma, 21 
Stanford data stream management system (STREAM), 
307 
State of the art 
algorithmic approach 
anomaly detection, 446 
pattern recognition, 446 
threshold usage, see Threshold detection 
architectural approach 
centralized evaluation, 444 
decentralized evaluation, 444 
distributed evaluation, 444-445 
local evaluation, 443 
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Steel grid structure 
accelerometers, 287, 289 
CAD drawing, 287, 288 
characterization, 287 
damage simulations, 289-291 
description, 287, 288 
dynamic tests, 287-288 
node numbers, 287, 289 
Stirling's formula, 189 
Stochastic subspace identification (SSI), 276 
STREAM, see Stanford data stream management system 
(STREAM) 
Strong k-barrier coverage, 466-467 
Structural health monitoring (SHM) systems; see also 
WSN-based SHM system 
damage detection 
nonparametric methods, 296-299 
parametric methods, 290-295 
time domain and/frequency domain algorithms, 
270 
definitions, 268-269 
nonparametric data analysis, time series analysis, 
279-287 
parametric data analysis, modal models, 
270-279 
sensing and data acquisition, 269 
steel grid structure, 287-290 
sunrise bridge, bascule-type movable, 269 
types, 268 
Sun SPOT radiogram protocol, 433 
Supervised learning algorithm 
Bayesian network classifiers, 8-10 
CRFs, see Conditional random fields (CRFs) 
decision trees, 7-8 
evaluation, accuracy, 7 
feature vector, 5-6 
k-NN, see k-nearest neighbor (k-NN) algorithm 
Markov models, 10-12 
selection, 6-7 
SVM, see Support vector machine (SVM) 
training, 7 
training data set determination, 5 
Support vector machine (SVM) 
confusion matrix, linear classifier, 46, 48 
defined, 14 
training set evaluation, linear classifier, 46, 47 
two-dimensional, 15 
SVD, see Singular value decomposition (SVD) 
SVM, see Support vector machine (SVM) 
Swarming pebbles and MFAL surveillance zone, 
628-630 
Sweep coverage 
POIs, 470 
traveling salesman problem, 470-471 
Synchronized sampling instants 
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local estimate exchange 


Tessellation and coverage, 460 
Threshold detection 


advantages, 491 
consensus strategy, 492 
consensus vs. fusion, 495 
covariance intersection, 493, 494 
ellipsoidal intersection fusion method, 494 
error-covariances, 491 
fusion-consensus strategy, 493 
Gaussian function, 491 
mutual mean and covariance, 495 
node's local algorithm, 491, 492 
scalar weights, 492-493 

local measurements exchange 
independent, 489 
information filter, 489 


node's local algorithm, 489, 490 


T 


TCL, see Tool Command Language (TCL) 
TEDS, see Transducer electronic data sheets (TEDS) 
Telemedicine 


complex events, 445 
drawback, 445 


temperature and humidity sensors, 445 


Thresholds tuning 


algorithm 
CI, 259, 260 
transition probabilities, 259-260 
ground-truth 
CL 257 
human-based, 257-258 
transition probabilities, 257, 258 
values, 256 
IPERDS, 256-257 
Markov chain, 257 
similarity trend analysis, 258, 259 
trace, parking spaces, 257 
wrong state/background synchronization behavior, 


P14, 258-259 


Time domain modal parameter identification methods 


CEA and PTD, 271 
control theory, 271 


annual expenditure per consumer health (1990-2007), 


568 
average patient per stay (1980-2005), 568 
bar code technology, 569-570 
continuous monitoring model, 580-581 
description, 568 
emergency room visits per 1000 population, 568, 569 
inpatient days per 1000 population, 
568, 569 
ISO 18000-6c and ISO 18000-7, 581-582 
outpatient days per 1000 population, 568, 570 
real-time, see Real-time model 
research 
finger pulse oximeter, 583-584 
implantable Doppler flowmeter, 584, 585 
remote-controlled pacemaker programmer, 
584-586 
REID, see Radio-frequency identification (REID) 
sensors 
biomedical sensors vs. environmental sensors, 
575-576 
classification, biomedical, 576-577 
oral, 577-578 
total health expenditures, 569 


Temporal filter, 315-316 
Temporal modeling, RM 


algorithm, 163 
autoregressive, 165-166 
constant model, 164-165 
defined, 162-163 
Kalman filters, 165 

LMS filter, 166 


Hankel matrix, 272 
ITD and ERA, 271 
Markov parameters, 271-272 
SVD, 272 
Time Of Arrival for Data (TOAD) 
architecture, 310-311 
belief components 
confidence module, 312-314 
error detection and correction module, 314 
components 
arbitration, 318 
smoothing, see Smoothing component 
data cleaning, 310 
Time series model 
advantages, 279 
AR and ARX, 279-281 
ARIMA, 279 
ARMA, 279, 280 
ARMAX, 280, 281 
ASCE benchmark structure, 279 
conjunction, novelty detection 
AR, 283 
Mahalanobis distance, 283-284 
outlier detection, 283 
RD, 282 
linear matrix equation, 281-282 
polynomials, 280 
sensor clustering, 284-287 
statistical, 280 
TOAD, see Time Of Arrival for Data (TOAD) 
Tone-based swarming detection network 
audio microphone sensors, MICAz motes, 609-611 


Characteristics, pebble nodes and remote receiver, 593, 
594 
Chipcon CC2420 radio chip, 611 
communications transmissions, 592 
connectivity factors, 612 
datasets, 612 
description, 604 
elements, 591, 592 
emergence and swarm intelligence, 591 
ERP, 611 
event chart, 593 
false alarms and intruder track, zone requirements, 
605, 606 
false alarm snapshot, zone requirements, 604-607 
features, 591 
grass field, experiments, 611, 612 
interactive systems, 596 
intruder blockage detection analysis, see Intruder 
blockage detection analysis 
MICAZz radio specifications, 611 
mote-to-receiver connectivity, 597, 598 
multiple remote directional receivers, 592 
near-neighbor signal cueing system, 596, 597 
noncoherent combining experiment, 612, 613 
pebble nodes’ sensors, 591-592 
pebble sensor range performance, 596, 597 
pebble signaling range performance, 597, 598 
preliminary hardware design, 608-609 
probability of detection (PD), 592 
probability of success, 604 
propagation loss and incoherent tone 
integration, 595 
prototype evaluation 
configurations, 614 
description, 613 
event chart, line configuration, 616 
grid configuration, 614, 615 
power levels, line configuration, 616, 617 
power levels, receiver station, 615, 616 
sensitivity inconsistencies, 617 
RSSI measurement, 611 
sensor nodes, 591 
signal detection theory, 596 
swarming network concept, 593 
system features stationary pebbles, 607 
transmitting pebbles, 594, 595 
vibration frequencies, 592 
zone range requirements, pebbles, 596 
Tool Command Language (TCL), 525 
Transducer electronic data sheets (TEDS) 
defined, 61 
IEEE 1451 TEDS formats, 61, 63 
Transducer interface module (TIM) 
analog, 57, 58 
capabilities, NCAP-transducer interface, 57 
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full-featured digital, 59 

PC, 59 

quasi-digital, 58-59 

smart and IEEE 1451 standard, 60-63 

SNs, see Sensor networks (SNs) 

SPI, 59-60 

UART, 60 

1-wire, 60 
Tree classifier algorithm, machine learning, 44 
Tree routing (TR) protocol, 233 


U 


UART, see Universal asynchronous receiver/transmitter 
(UART) 
Universal asynchronous receiver/transmitter (UART), 
60-61 
Unsupervised learning 
activity recognition projects, 23 
ART, see Adaptive resonance theory (ART) 
clustering, 17-20 
generic mined models, 23 
SmartHouse project, 23 
SN applications, 17 
SOMs, see Self-organizing maps (SOMs) 


Ww 


Warehouse distribution scenarios, 540-541 
Water supply system 
discrete linear state-space model, 402-403 
discretization, 401-402 
linearized Saint-Venant model, 401 
Saint-Venant model, 399-400 
steady-state flow, 400 
Wavelet-based algorithm 
approximation error, 337 
null moments, 336 
periodic wavelet transforms, 336 
pseudocode, 337 
wavelet smoothness, 335 
Wavelet packet transform (WPT), see Wavelet transform 
(WT) 
Wavelet transform (WT), 81-82 
Weak k-barrier coverage, 467-468 
Weather data, 50, 51 
Web service stack, SN 
client-server interaction and message exchange, 66, 67 
cloud platforms, 68, 70 
POST and GET request and response, 66, 67 
structure, TCP/socket and HTTP, 68, 69 
Weighted moving average (WMA), 309 
Wireless links and connectivity, 473 
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Wireless multimedia sensor networks (WMSNs) 


algorithm thresholds tuning, see Thresholds tuning 

description, 246 

detection performance, algorithm occupancy status, 
260-261 

image processing, parking space occupancy detection, 
249-256 

ITS, 246 

SEED-EYE camera network, 261-263 

smart cameras, 246-248 


Wireless passive sensor networks (WPSNs), see Low-power 


solutions, WPSNs 


Wireless sensor networks (WSNs); see also Data cleaning, 


WSN; Prediction-based data collection, WSN; 
WSN-based SHM system 

agents’ negotiation, see Agents’ negotiation, WSNs 

architecture, sensor node, 205, 206 

correlated signal models and system equations, see 
Correlated signal models and system equations 

CR, see Cognitive radio (CR) 

data collector and data router, 206 

data-driven processor architectures, 526 

deployment and spectrum scarcity, 207 

description, 205 

in event detection, see Event detection, WSNs 

flexibility and scalability, 206 

fusion, fault tolerance, 207 

IEEE 802.11, 206 

localization, sensor nodes, 207 

low-power circuit techniques, 517-518 

multi-sensor systems and observed signal properties, see 
Multi-sensor systems 

network structure, 356-357 

resource limitations, 357 


routing and applications, 206 

scalability and security, 207 

sensor network scheme and conventional compression, 

358 

sensor network scheme and CS, 358-359 
WMA, see Weighted moving average (WMA) 
WSN-based SHM system 

AR-ARX and DLAC methods, 81 

distributed modal analysis, 85-88 

and environmental monitoring, 79 

factors, 80 

Hedong bridge, 78-79 

modal analysis, 82-84 

properties, 81 

short and long-term monitoring, 78 

WSN-Cloud SHM, 88-89 

WT/WPT, 81-82 
WSN-Cloud SHM, 88-89 
WSNs, see Wireless sensor networks (WSNs) 
WT, see Wavelet transform (WT) 


X 


XML, see Extensible markup language (XML) 


Z 


ZigBee networks 
CS measurement, 392 
pack and forward (PF), 392 
protocol, 233 
rules, 392-393 


